from:"Ted Yu"

Re: Add tracking of backend memory allocated to pg_stat_activity

2023-09-02 Thread Ted Yu

On Thu, Aug 31, 2023 at 9:19 AM John Morris 
wrote:

> Here is an updated version of the earlier work.
>
> This version:
>1) Tracks memory as requested by the backend.
>2) Includes allocations made during program startup.
>3) Optimizes the "fast path" to only update two local variables.
>4) Places a cluster wide limit on total memory allocated.
>
> The cluster wide limit is useful for multi-hosting. One greedy cluster
> doesn't starve
> the other clusters of memory.
>
> Note there isn't a good way to track actual memory used by a cluster.
> Ideally, we like to get the working set size of each memory segment along
> with
> the size of the associated kernel data structures.
> Gathering that info in a portable way is a "can of worms".
> Instead, we're managing memory as requested by the application.
> While not identical, the two approaches are strongly correlated.
>
>  The memory model used is
>1) Each process is assumed to use a certain amount of memory
>simply by existing.
>2) All pg memory allocations are counted, including those before
>the process is fully initialized.
> 3) Each process maintains its own local counters. These are the
> "truth".
>  4) Periodically,
> -  local counters are added into the global, shared memory
> counters.
>  - pgstats is updated
>  - total memory is checked.
>
> For efficiency, the global total is an approximation, not a precise number.
> It can be off by as much as 1 MB per process. Memory limiting
> doesn't need precision, just a consistent and reasonable approximation.
>
> Repeating the earlier benchmark test, there is no measurable loss of
> performance.
>
> Hi,
In `InitProcGlobal`:

+elog(WARNING, "proc init: max_total=%d  result=%d\n",
max_total_bkend_mem, result);

Is the above log for debugging purposes ? Maybe the log level should be
changed.

+
errmsg("max_total_backend_memory %dMB - shared_memory_size %dMB is <=
100MB",

The `<=` in the error message is inconsistent with the `max_total_bkend_mem
< result + 100` check.
Please modify one of them.

For update_global_allocation :

+
Assert((int64)pg_atomic_read_u64(>total_bkend_mem_bytes) >= 0);
+
Assert((int64)pg_atomic_read_u64(>global_dsm_allocation) >= 0);

Should the assertions be done earlier in the function?

For reserve_backend_memory:

+   return true;
+
+   /* CASE: the new allocation is within bounds. Take the fast path. */
+   else if (my_memory.allocated_bytes + size <= allocation_upper_bound)

The `else` can be omitted (the preceding if block returns).

For `atomic_add_within_bounds_i64`

+   newval = oldval + add;
+
+   /* check if we are out of bounds */
+   if (newval < lower_bound || newval > upper_bound ||
addition_overflow(oldval, add))

Since the summation is stored in `newval`, you can pass newval to
`addition_overflow` so that `addition_overflow` doesn't add them again.

There are debug statements, such as:

+   debug("done\n");

you can drop them in the next patch.

Thanks

Re: [jackson-user] NoSuchMethodError: com.fasterxml.jackson.core.JsonParser.getReadCapabilities()

2023-06-07 Thread Ted Yu

One way we're trying now is to upgrade jackson-core and jackson-databind 
from 2.9.8 to 2.13.3 across the board (without upgrading projects which 
depend on jackson-core and jackson-databind).

Tatu: do you anticipate any issue at runtime ?

Thanks

On Wednesday, June 7, 2023 at 4:25:20 AM UTC-7 Joo Hyuk Kim (김주혁) wrote:

> This definitely sounds like transitive dependency issue 沈.
>
> What I can suggest (sadly not a solution) is do some searching around 
> maven how to specify dep. version, override, etc..
> On Wednesday, June 7, 2023 at 4:16:48 AM UTC+9 Tatu Saloranta wrote:
>
>> On Tue, Jun 6, 2023 at 12:04 PM Ted Yu  wrote: 
>> > 
>> > Thanks for taking a look. 
>> > 
>> > The jar is produced with `maven-shade-plugin`. Do you have suggestion 
>> on how I can trace down the origin of 2.6.7 databind in such scenario ? 
>>
>> Something in Maven (etc) build would have dependency. But I am not 
>> sure 2.6.7 of databind should be problematic. 
>> `JacksonFeatureSet` was added in 2.12.0 of `jackson-core` so version 
>> of databind would need to be 2.12 or later (to expect it). 
>> So something is providing `2.6.x` of `jackson-core`; possibly shaded 
>> into some other artifact... so it could definitely come from 
>> non-jackson jar. :-( 
>> I don't know how to figure out where ClassLoader gets particular 
>> classes; chances are it's necessary to see what various jars included 
>> in classpath contain. 
>>
>> -+ Tatu +- 
>>
>> > 
>> > Cheers 
>> > 
>> > On Tuesday, June 6, 2023 at 11:55:07 AM UTC-7 Tatu Saloranta wrote: 
>> >> 
>> >> Version 2.6 is not supported (and hasn't for a while), so I am not 
>> >> sure how much we can help here with specific details. 
>> >> 
>> >> But exception message does suggest a version discrepancy: not between 
>> >> 2.6.7.1 and 2.6.7 (those are compatible being patch/micro-path within 
>> >> same minor release), but by something having later version 
>> >> (jackson-databind from looks) and requiring matching-or-later 
>> >> `jackson-core`. 
>> >> So you do not have a consistent set of Jackson components. 
>> >> 
>> >> -+ Tatu +- 
>> >> 
>> >> On Tue, Jun 6, 2023 at 11:52 AM Ted Yu  wrote: 
>> >> > 
>> >> > Hi, 
>> >> > We encounter the error shown at the end. 
>> >> > 
>> >> > Looking at 
>> META-INF/maven/com.fasterxml.jackson.core/jackson-databind/pom.xml in the 
>> fat jar: 
>> >> > 
>> >> > com.fasterxml.jackson.core 
>> >> > jackson-databind 
>> >> > 2.6.7.1 
>> >> > jackson-databind 
>> >> > 
>> >> > But I don't see 2.6.7 in any pom.xml in our repository. 
>> >> > I checked dependency:tree output as well. 
>> >> > 
>> >> > I wonder if someone has hint on how to find where the 2.6.7 
>> dependency came in. 
>> >> > 
>> >> > Thanks 
>> >> > 
>> >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
>> >> > at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>  
>>
>> >> > at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  
>>
>> >> > at java.lang.reflect.Method.invoke(Method.java:498) 
>> >> > at 
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>>  
>>
>> >> > ... 13 more 
>> >> > Caused by: java.util.concurrent.ExecutionException: 
>> java.lang.NoSuchMethodError: 
>> com.fasterxml.jackson.core.JsonParser.getReadCapabilities()Lcom\/fasterxml\/jackson\/core\/util\/JacksonFeatureSet;
>>  
>>
>> >> > at 
>> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
>>
>> >> > at 
>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) 
>> >> > ... 
>> >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
>> >> > at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>  
>>
>> >> > at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  
>>
>> >> > at java.lang.reflect.Method.invoke(Method.jav

Re: [jackson-user] NoSuchMethodError: com.fasterxml.jackson.core.JsonParser.getReadCapabilities()

2023-06-06 Thread Ted Yu

Thanks for taking a look.

The jar is produced with `maven-shade-plugin`. Do you have suggestion on 
how I can trace down the origin of 2.6.7 databind in such scenario ?

Cheers

On Tuesday, June 6, 2023 at 11:55:07 AM UTC-7 Tatu Saloranta wrote:

> Version 2.6 is not supported (and hasn't for a while), so I am not
> sure how much we can help here with specific details.
>
> But exception message does suggest a version discrepancy: not between
> 2.6.7.1 and 2.6.7 (those are compatible being patch/micro-path within
> same minor release), but by something having later version
> (jackson-databind from looks) and requiring matching-or-later
> `jackson-core`.
> So you do not have a consistent set of Jackson components.
>
> -+ Tatu +-
>
> On Tue, Jun 6, 2023 at 11:52 AM Ted Yu  wrote:
> >
> > Hi,
> > We encounter the error shown at the end.
> >
> > Looking at 
> META-INF/maven/com.fasterxml.jackson.core/jackson-databind/pom.xml in the 
> fat jar:
> >
> > com.fasterxml.jackson.core
> > jackson-databind
> > 2.6.7.1
> > jackson-databind
> >
> > But I don't see 2.6.7 in any pom.xml in our repository.
> > I checked dependency:tree output as well.
> >
> > I wonder if someone has hint on how to find where the 2.6.7 dependency 
> came in.
> >
> > Thanks
> >
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
> > ... 13 more
> > Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.core.JsonParser.getReadCapabilities()Lcom\/fasterxml\/jackson\/core\/util\/JacksonFeatureSet;
> > at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> > at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> > ...
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at 
> org.springframework.context.event.ApplicationListenerMethodAdapter.doInvoke(ApplicationListenerMethodAdapter.java:344)
> > ... 31 more
> > Caused by: java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.core.JsonParser.getReadCapabilities()Lcom\/fasterxml\/jackson\/core\/util\/JacksonFeatureSet;
> > at 
> com.fasterxml.jackson.databind.DeserializationContext.(DeserializationContext.java:211)
> > at 
> com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.(DefaultDeserializationContext.java:50)
> > at 
> com.fasterxml.jackson.databind.deser.DefaultDeserializationContext$Impl.(DefaultDeserializationContext.java:391)
> > at 
> com.fasterxml.jackson.databind.deser.DefaultDeserializationContext$Impl.createInstance(DefaultDeserializationContext.java:413)
> > at 
> com.fasterxml.jackson.databind.ObjectMapper.createDeserializationContext(ObjectMapper.java:4656)
> > at 
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4585)
> > at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3548)
> > at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3531)
> > ...
> > at 
> org.hibernate.metamodel.model.convert.internal.JpaAttributeConverterImpl.toDomainValue(JpaAttributeConverterImpl.java:45)
> > at 
> org.hibernate.type.descriptor.converter.AttributeConverterSqlTypeDescriptorAdapter$2.doConversion(AttributeConverterSqlTypeDescriptorAdapter.java:140)
> > at 
> org.hibernate.type.descriptor.converter.AttributeConverterSqlTypeDescriptorAdapter$2.extract(AttributeConverterSqlTypeDescriptorAdapter.java:121)
> > at 
> org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:261)
> > at 
> org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:257)
> > at 
> org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:247)
> > at 
> org.hibernate.type.AbstractStandardBasicType.hydrate(AbstractStandardBasicType.java:333)
> >
> > --
> > You received this message because you are subscribed to the Google 
> Groups "jackson-user" group.
> > To unsu

[jackson-user] Re: NoSuchMethodError: com.fasterxml.jackson.core.JsonParser.getReadCapabilities()

2023-06-06 Thread Ted Yu

One interesting observation: there is no 2.6.7 directory on local computer:

 ls -lt 
/Users/zhihongyu/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.6.
2.6.5/ 2.6.6/

This seems to imply that 2.6.7 databind may have come from another fat jar.
On Tuesday, June 6, 2023 at 11:52:19 AM UTC-7 Ted Yu wrote:

> Hi,
> We encounter the error shown at the end.
>
> Looking 
> at META-INF/maven/com.fasterxml.jackson.core/jackson-databind/pom.xml in 
> the fat jar:
>
>   com.fasterxml.jackson.core
>   jackson-databind
>   2.6.7.1
>   jackson-databind
>
> But I don't see 2.6.7 in any pom.xml in our repository.
> I checked dependency:tree output as well.
>
> I wonder if someone has hint on how to find where the 2.6.7 dependency 
> came in.
>
> Thanks
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
> ... 13 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.core.JsonParser.getReadCapabilities()Lcom\/fasterxml\/jackson\/core\/util\/JacksonFeatureSet;
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> ...
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.springframework.context.event.ApplicationListenerMethodAdapter.doInvoke(ApplicationListenerMethodAdapter.java:344)
> ... 31 more
> Caused by: java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.core.JsonParser.getReadCapabilities()Lcom\/fasterxml\/jackson\/core\/util\/JacksonFeatureSet;
> at 
> com.fasterxml.jackson.databind.DeserializationContext.(DeserializationContext.java:211)
> at 
> com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.(DefaultDeserializationContext.java:50)
> at 
> com.fasterxml.jackson.databind.deser.DefaultDeserializationContext$Impl.(DefaultDeserializationContext.java:391)
> at 
> com.fasterxml.jackson.databind.deser.DefaultDeserializationContext$Impl.createInstance(DefaultDeserializationContext.java:413)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.createDeserializationContext(ObjectMapper.java:4656)
> at 
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4585)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3548)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3531)
> ...
> at 
> org.hibernate.metamodel.model.convert.internal.JpaAttributeConverterImpl.toDomainValue(JpaAttributeConverterImpl.java:45)
> at 
> org.hibernate.type.descriptor.converter.AttributeConverterSqlTypeDescriptorAdapter$2.doConversion(AttributeConverterSqlTypeDescriptorAdapter.java:140)
> at 
> org.hibernate.type.descriptor.converter.AttributeConverterSqlTypeDescriptorAdapter$2.extract(AttributeConverterSqlTypeDescriptorAdapter.java:121)
> at 
> org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:261)
> at 
> org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:257)
> at 
> org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:247)
> at 
> org.hibernate.type.AbstractStandardBasicType.hydrate(AbstractStandardBasicType.java:333)
>

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jackson-user+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jackson-user/5a6f2695-715c-4719-8563-10b5ac9d3e2cn%40googlegroups.com.

[jackson-user] NoSuchMethodError: com.fasterxml.jackson.core.JsonParser.getReadCapabilities()

2023-06-06 Thread Ted Yu

Hi,
We encounter the error shown at the end.

Looking 
at META-INF/maven/com.fasterxml.jackson.core/jackson-databind/pom.xml in 
the fat jar:

  com.fasterxml.jackson.core
  jackson-databind
  2.6.7.1
  jackson-databind

But I don't see 2.6.7 in any pom.xml in our repository.
I checked dependency:tree output as well.

I wonder if someone has hint on how to find where the 2.6.7 dependency came 
in.

Thanks

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
... 13 more
Caused by: java.util.concurrent.ExecutionException: 
java.lang.NoSuchMethodError: 
com.fasterxml.jackson.core.JsonParser.getReadCapabilities()Lcom\/fasterxml\/jackson\/core\/util\/JacksonFeatureSet;
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
...
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.springframework.context.event.ApplicationListenerMethodAdapter.doInvoke(ApplicationListenerMethodAdapter.java:344)
... 31 more
Caused by: java.lang.NoSuchMethodError: 
com.fasterxml.jackson.core.JsonParser.getReadCapabilities()Lcom\/fasterxml\/jackson\/core\/util\/JacksonFeatureSet;
at 
com.fasterxml.jackson.databind.DeserializationContext.(DeserializationContext.java:211)
at 
com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.(DefaultDeserializationContext.java:50)
at 
com.fasterxml.jackson.databind.deser.DefaultDeserializationContext$Impl.(DefaultDeserializationContext.java:391)
at 
com.fasterxml.jackson.databind.deser.DefaultDeserializationContext$Impl.createInstance(DefaultDeserializationContext.java:413)
at 
com.fasterxml.jackson.databind.ObjectMapper.createDeserializationContext(ObjectMapper.java:4656)
at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4585)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3548)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3531)
...
at 
org.hibernate.metamodel.model.convert.internal.JpaAttributeConverterImpl.toDomainValue(JpaAttributeConverterImpl.java:45)
at 
org.hibernate.type.descriptor.converter.AttributeConverterSqlTypeDescriptorAdapter$2.doConversion(AttributeConverterSqlTypeDescriptorAdapter.java:140)
at 
org.hibernate.type.descriptor.converter.AttributeConverterSqlTypeDescriptorAdapter$2.extract(AttributeConverterSqlTypeDescriptorAdapter.java:121)
at 
org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:261)
at 
org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:257)
at 
org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:247)
at 
org.hibernate.type.AbstractStandardBasicType.hydrate(AbstractStandardBasicType.java:333)

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jackson-user+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jackson-user/5279ce68-9918-4fe1-aca7-94cf2c484be2n%40googlegroups.com.

[jira] [Resolved] (FLINK-32025) Make job cancellation button on UI configurable

2023-05-07 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved FLINK-32025.

Resolution: Duplicate

> Make job cancellation button on UI configurable
> ---
>
> Key: FLINK-32025
> URL: https://issues.apache.org/jira/browse/FLINK-32025
> Project: Flink
>  Issue Type: Improvement
>    Reporter: Ted Yu
>Priority: Major
>
> On the flink job UI, there is `Cancel Job` button.
> When the job UI is shown to users, it is desirable to hide the button so that 
> normal user doesn't mistakenly cancel a long running flink job.
> This issue adds configuration for hiding the `Cancel Job` button.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32025) Make job cancellation button on UI configurable

2023-05-07 Thread Ted Yu (Jira)

Ted Yu created FLINK-32025:
--

 Summary: Make job cancellation button on UI configurable
 Key: FLINK-32025
 URL: https://issues.apache.org/jira/browse/FLINK-32025
 Project: Flink
  Issue Type: Improvement
Reporter: Ted Yu


On the flink job UI, there is `Cancel Job` button.

When the job UI is shown to users, it is desirable to hide the button so that 
normal user doesn't mistakenly cancel a long running flink job.

This issue adds configuration for hiding the `Cancel Job` button.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32025) Make job cancellation button on UI configurable

2023-05-07 Thread Ted Yu (Jira)

Ted Yu created FLINK-32025:
--

 Summary: Make job cancellation button on UI configurable
 Key: FLINK-32025
 URL: https://issues.apache.org/jira/browse/FLINK-32025
 Project: Flink
  Issue Type: Improvement
Reporter: Ted Yu


On the flink job UI, there is `Cancel Job` button.

When the job UI is shown to users, it is desirable to hide the button so that 
normal user doesn't mistakenly cancel a long running flink job.

This issue adds configuration for hiding the `Cancel Job` button.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[h2] Unique index or primary key violation

2023-03-14 Thread Ted Yu

Hi,
I am facing the following error when writing to a table via hibernate:
03/11/23 19:44:52 808 ERROR SqlExceptionHelper: Unique index or primary key 
violation: "PRIMARY_KEY_6 ON PUBLIC.AD(ACCU_ID, TARGET_ID, REPLAY_ID) 
VALUES (8, 24, 1, 7)"; SQL statement:
insert into ad (end_time, start_time, accu_id, target_id, replay_id) values 
(?, ?, ?, ?, ?) [23505-192]

I wonder what the 7 in (8, 24, 1, 7) means. (ACCU_ID, TARGET_ID, REPLAY_ID) has 
3 columns (the primary key for ad table). It is a bit strange why a fourth 
number shows up.

Thanks

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/6a3606a7-a4bb-4ddf-bfa9-5d5c54e173e6n%40googlegroups.com.

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-01-21 Thread Ted Yu

On Sat, Jan 21, 2023 at 3:05 AM Dean Rasheed 
wrote:

> On Tue, 10 Jan 2023 at 14:43, Dean Rasheed 
> wrote:
> >
> > Rebased version attached.
> >
>
> Rebased version, following 8eba3e3f02 and 5d29d525ff.
>
> Regards,
> Dean
>
Hi,
In transform_MERGE_to_join :

+   if (action->matchKind ==
MERGE_WHEN_NOT_MATCHED_BY_SOURCE)
+   tgt_only_tuples = true;
+   if (action->matchKind ==
MERGE_WHEN_NOT_MATCHED_BY_TARGET)

There should be an `else` in front of the second `if`.
When tgt_only_tuples and src_only_tuples are both true, we can come out of
the loop.

Cheers

Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-21 Thread Ted Yu

On Sat, Jan 21, 2023 at 4:03 AM Hayato Kuroda (Fujitsu) <
kuroda.hay...@fujitsu.com> wrote:

> Dear Katsuragi-san,
>
> Thank you for reviewing! PSA new patch set.
>
> > ## v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
> > +
> > +
> > PQCanConncheckPQCan
> > Conncheck
> > + 
> > +  
> > +   Returns the status of the socket.
> >
> > Is this description right? I think this description is for
> > PQConncheck. Something like "Checks whether PQConncheck is
> > available on this platform." seems better.
>
> It might be copy-and-paste error. Thanks for reporting.
> According to other parts, the sentence should be started like "Returns
> ...".
> So I followed the style and did cosmetic change.
>
> > +/* Check whether the postgres server is still alive or not */
> > +extern int PQConncheck(PGconn *conn);
> > +extern int PQCanConncheck(void);
> >
> > Should the names of these functions be in the form of PQconnCheck?
> > Not PQConncheck (c.f. The section of fe-misc.c in libpq-fe.h).
>
> Agreed, fixed.
>
> > ## v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
> > +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states);
> > +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states_all);
> > +PG_FUNCTION_INFO_V1(postgres_fdw_can_verify_connection_states);
> >
> > This patch adds new functions to postgres_fdw for PostgreSQL 16.
> > So, I think it is necessary to update the version of postgres_fdw (v1.1
> > to v1.2).
>
> I checked postgres_fdw commit log, and it seemed that the version was
> updated when SQL functions are added. Fixed.
>
> > +postgres_fdw_verify_connection_states_all() returns
> > boolean
> > +
> > + 
> > +  This function checks the status of remote connections that are
> > established
> > +  by postgres_fdw from the local session to
> > the foreign
> > +  servers. This check is performed by polling the socket and allows
> >
> > It seems better to add a description that states this function
> > checks all the connections. Like the description of
> > postgres_fdw_disconnect_all(). For example, "This function
> > checks the status of 'all the' remote connections..."?
>
> I checked the docs and fixed. Moreover, some inconsistent statements were
> also fixed.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>
> Hi,
For v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch ,
`pqConnCheck_internal` only has one caller which is quite short.
Can pqConnCheck_internal and PQConnCheck be merged into one func ?

+int
+PQCanConnCheck(void)

It seems the return value should be of bool type.

Cheers

Re: bug: copy progress reporting of backends which run multiple COPYs

2023-01-20 Thread Ted Yu

On Fri, Jan 20, 2023 at 4:51 PM Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> On Thu, 19 Jan 2023 at 06:47, Justin Pryzby  wrote:
> >
> > pg_stat_progress_copy was added in v14 (8a4f618e7, 9d2d45700).
> >
> > But if a command JOINs file_fdw tables, the progress report gets bungled
> > up.  This will warn/assert during file_fdw tests.
>
> I don't know what to do with that other than disabling COPY progress
> reporting for file_fdw, i.e. calls to BeginCopyFrom that don't supply
> a pstate. This is probably the best option, because a table backed by
> file_fdw would also interfere with COPY TO's progress reporting.
>
> Attached a patch that solves this specific issue in a
> binary-compatible way. I'm not super happy about relying on behavior
> of callers of BeginCopyFrom (assuming that users that run copy
> concurrently will not provide a ParseState* to BeginCopyFrom), but it
> is what it is.
>
> Kind regards,
>
> Matthias van de Meent
>
Hi,
In `BeginCopyFrom`, I see the following :

if (pstate)
{
cstate->range_table = pstate->p_rtable;
cstate->rteperminfos = pstate->p_rteperminfos;

Is it possible to check range_table / rteperminfos so that we don't
introduce the bool field ?

Cheers

Re: Named Operators

2023-01-20 Thread Ted Yu

On Fri, Jan 20, 2023 at 9:17 AM Gurjeet Singh  wrote:

> On Sat, Jan 14, 2023 at 6:14 AM Gurjeet Singh  wrote:
> >
> > I agree that an identifier _surrounded_ by the same token (e.g. #foo#)
> > or the pairing token (e.g. {foo}) looks better aesthetically, so I am
> > okay with any of the following variations of the scheme, as well:
> >
> > \#foo\#  (tested; works)
> > \#foo#   (not tested; reduces ident length by 1)
> >
> > We can choose a different character, instead of #. Perhaps \{foo} !
>
> Please find attached the patch that uses \{foo} styled Named
> Operators. This is in line with Tom's reluctant hint at possibly using
> curly braces as delimiter characters. Since the curly braces are used
> by the SQL Specification for row pattern recognition, this patch
> proposes escaping the first of the curly braces.
>
> We can get rid of the leading backslash, if (a) we're confident that
> SQL committee will not use curly braces anywhere else, and (b) if
> we're confident that if/when Postgres supports Row Pattern Recognition
> feature, we'll be able to treat curly braces inside the PATTERN clause
> specially. Since both of those conditions are unlikely, I think we
> must settle for the escaped-first-curly-brace style for the naming our
> operators.
>
> Keeping with the previous posts, here's a sample SQL script showing
> what the proposed syntax will look like in action. Personally, I
> prefer the \#foo style, since the \# prefix stands out among the text,
> better than \{..} does, and because # character is a better signal of
> an operator than {.
>
> create operator \{add_point}
> (function = box_add, leftarg = box, rightarg = point);
> create table test(a box);
> insert into test values('((0,0),(1,1))'), ('((0,0),(2,1))');
> select a as original, a \{add_point} '(1,1)' as modified from test;
> drop operator \{add_point}(box, point);
>
> Best regards,
> Gurjeet
> http://Gurje.et


Hi,
Since `validIdentifier` doesn't modify the contents of `name` string, it
seems that there is no need to create `tmp` string in `validNamedOperator`.
You can pass the start and end offsets into the string (name) as second and
third parameters to `validIdentifier`.

Cheers

Re: Operation log for major operations

2023-01-20 Thread Ted Yu

On Fri, Jan 20, 2023 at 1:19 AM Dmitry Koval  wrote:

> Thanks, Ted Yu!
>
>  > Please update year for the license in pg_controllog.c
>
> License updated for files pg_controllog.c, controllog_utils.c,
> pg_controllog.h, controllog_utils.h.
>
>  > +check_file_exists(const char *datadir, const char *path)
>  > There is existing helper function such as:
>  > src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char
> *name);
>  > Is it possible to reuse that code ?
>
> There are a lot of functions that check the file existence:
>
> 1) src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char
> *name);
> 2) src/backend/jit/jit.c:static bool file_exists(const char *name);
> 3) src/test/regress/pg_regress.c:bool file_exists(const char *file);
> 4) src/bin/pg_upgrade/exec.c:bool pid_lock_file_exists(const char
> *datadir);
> 5) src/backend/commands/extension.c:bool extension_file_exists(const
> char *extensionName);
>
> But there is no unified function: different components use their own
> function with their own specific.
> Probably we can not reuse code of dfmgr.c:file_exists() because this
> function skip "errno == EACCES" (this error is critical for us).
> I copied for src/bin/pg_rewind/file_ops.c:check_file_exists() code of
> function jit.c:file_exists() (with adaptation for the utility).
>
> --
> With best regards,
> Dmitry Koval
>
> Postgres Professional: http://postgrespro.com

Hi,
Maybe another discussion thread can be created for the consolidation of
XX_file_exists functions.

Cheers

Re: Operation log for major operations

2023-01-19 Thread Ted Yu

On Thu, Jan 19, 2023 at 1:12 PM Dmitry Koval  wrote:

>  >The patch does not apply on top of HEAD ...
>
> Thanks!
> Here is a fixed version.
>
> Additional changes:
> 1) get_operation_log() function doesn't create empty operation log file;
> 2) removed extra unlink() call.
>
> --
> With best regards,
> Dmitry Koval
>
> Postgres Professional: http://postgrespro.com

Hi,

Copyright (c) 1996-2022

Please update year for the license in pg_controllog.c

+check_file_exists(const char *datadir, const char *path)

There is existing helper function such as:

src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char *name);

Is it possible to reuse that code ?

Cheers

[jira] [Updated] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-42090:
---
Description: 
Previously a boolean variable, saslTimeoutSeen, was used in 
RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.

  was:
Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean 
variable wouldn't cover the following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.


> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Priority: Major
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Ted Yu (Jira)

Ted Yu created SPARK-42090:
--

 Summary: Introduce sasl retry count in RetryingBlockTransferor
 Key: SPARK-42090
 URL: https://issues.apache.org/jira/browse/SPARK-42090
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Ted Yu


Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean 
variable wouldn't cover the following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

properly sizing attnums in pg_get_publication_tables

2023-01-13 Thread Ted Yu

Hi,
I was looking at commit b7ae03953690a1dee455ba3823cc8f71a72cbe1d .

In `pg_get_publication_tables`, attnums is allocated with size
`desc->natts`. However, since some columns may be dropped, this size may be
larger than necessary.
When `nattnums > 0` is false, there is no need to allocate the `attnums`
array. In the current formation, `attnums` should be freed in this scenario.

Please take a look at the patch which moves the allocation to inside the
`if (nattnums > 0)` block.

Thanks


proper-sizing-of-attnums.patch
Description: Binary data

Re: GUC for temporarily disabling event triggers

2023-01-12 Thread Ted Yu

On Thu, Jan 12, 2023 at 12:26 PM Daniel Gustafsson  wrote:

> > On 11 Jan 2023, at 17:38, vignesh C  wrote:
> >
> > On Tue, 29 Nov 2022 at 18:16, Daniel Gustafsson  wrote:
> >>
> >>> On 3 Nov 2022, at 21:47, Daniel Gustafsson  wrote:
> >>
> >>> The patch adds a new GUC, ignore_event_trigger with two option values,
> 'all'
> >>> and 'none' (the login event patch had 'login' as well).
> >>
> >> The attached v2 fixes a small bug which caused testfailures the CFBot.
> >
> > The patch does not apply on top of HEAD as in [1], please post a rebased
> patch:
>
> The attached rebased v3 fixes the conflict.
>
> --
> Daniel Gustafsson   https://vmware.com/
>
> Hi,

`this GUC allows to temporarily suspending event triggers.`

It would be better to mention the name of GUC in the description.
Typo: suspending -> suspend

w.r.t. guc `ignore_event_trigger`, since it is supposed to disable event
triggers for a short period of time, is there mechanism to turn it off
(IGNORE_EVENT_TRIGGER_ALL) automatically ?

Cheers

Re: releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL

2023-01-11 Thread Ted Yu

On Wed, Jan 11, 2023 at 6:54 PM Amit Kapila  wrote:

> On Wed, Jan 11, 2023 at 9:31 AM Ted Yu  wrote:
> >
> > On Tue, Jan 10, 2023 at 7:55 PM houzj.f...@fujitsu.com <
> houzj.f...@fujitsu.com> wrote:
> >>
> >> On Wednesday, January 11, 2023 10:21 AM Ted Yu 
> wrote:
> >> > /* First time through, initialize parallel apply worker state
> hashtable. */
> >> > if (!ParallelApplyTxnHash)
> >> >
> >> > I think it would be better if `ParallelApplyTxnHash` is created by
> the first
> >> > successful parallel apply worker.
> >>
> >> Thanks for the suggestion. But I am not sure if it's worth to changing
> the
> >> order here, because It will only optimize the case where user enable
> parallel
> >> apply but never get an available worker which should be rare. And in
> such a
> >> case, it'd be better to increase the number of workers or disable the
> parallel mode.
> >>
> >
> >
> > I think even though the chance is rare, we shouldn't leak resource.
> >
>
> But that is true iff we are never able to start the worker. Anyway, I
> think it is probably fine either way but we can change it as per your
> suggestion to make it more robust and probably for the code clarity
> sake. I'll push this tomorrow unless someone thinks otherwise.
>
> --
> With Regards,
> Amit Kapila.
>

Thanks Amit for the confirmation.

Cheers

Re: releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL

2023-01-10 Thread Ted Yu

On Tue, Jan 10, 2023 at 7:55 PM houzj.f...@fujitsu.com <
houzj.f...@fujitsu.com> wrote:

> On Wednesday, January 11, 2023 10:21 AM Ted Yu 
> wrote:
> > /* First time through, initialize parallel apply worker state
> hashtable. */
> > if (!ParallelApplyTxnHash)
> >
> > I think it would be better if `ParallelApplyTxnHash` is created by the
> first
> > successful parallel apply worker.
>
> Thanks for the suggestion. But I am not sure if it's worth to changing the
> order here, because It will only optimize the case where user enable
> parallel
> apply but never get an available worker which should be rare. And in such a
> case, it'd be better to increase the number of workers or disable the
> parallel mode.
>
> Best Regards,
> Hou zj
>

I think even though the chance is rare, we shouldn't leak resource.

The `ParallelApplyTxnHash` shouldn't be created if there is no single apply
worker.

Re: releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL

2023-01-10 Thread Ted Yu

On Tue, Jan 10, 2023 at 6:13 PM houzj.f...@fujitsu.com <
houzj.f...@fujitsu.com> wrote:

> On Wednesday, January 11, 2023 1:25 AM Ted Yu  wrote:
>
> > I was reading src/backend/replication/logical/applyparallelworker.c .
> > In `pa_allocate_worker`, when pa_launch_parallel_worker returns NULL, I
> think the `ParallelApplyTxnHash` should be released.
>
> Thanks for reporting.
>
> ParallelApplyTxnHash is used to cache the state of streaming transactions
> being
> applied. There could be multiple streaming transactions being applied in
> parallel and their information were already saved in ParallelApplyTxnHash,
> so
> we should not release them just because we don't have a worker available to
> handle a new transaction here.
>
> Best Regards,
> Hou zj
>
Hi,

/* First time through, initialize parallel apply worker state
hashtable. */
if (!ParallelApplyTxnHash)

I think it would be better if `ParallelApplyTxnHash` is created by the
first successful parallel apply worker.

Please take a look at the new patch and see if it makes sense.

Cheers


create-parallel-apply-txn-hash-after-the-first-worker.patch
Description: Binary data

Re: releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL

2023-01-10 Thread Ted Yu

On Tue, Jan 10, 2023 at 9:43 AM Ted Yu  wrote:

>
>
> On Tue, Jan 10, 2023 at 9:26 AM Ted Yu  wrote:
>
>>
>>
>> On Tue, Jan 10, 2023 at 9:25 AM Ted Yu  wrote:
>>
>>> Hi,
>>> I was reading src/backend/replication/logical/applyparallelworker.c .
>>> In `pa_allocate_worker`, when pa_launch_parallel_worker returns NULL, I
>>> think the `ParallelApplyTxnHash` should be released.
>>>
>>> Please see the patch.
>>>
>>> Thanks
>>>
>> Here is the patch :-)
>>
>
> In `pa_process_spooled_messages_if_required`, the `pa_unlock_stream` call
> immediately follows `pa_lock_stream`.
> I assume the following is the intended sequence of calls. If this is the
> case, I can add it to the patch.
>
> Cheers
>
> diff --git a/src/backend/replication/logical/applyparallelworker.c
> b/src/backend/replication/logical/applyparallelworker.c
> index 2e5914d5d9..9879b3fff2 100644
> --- a/src/backend/replication/logical/applyparallelworker.c
> +++ b/src/backend/replication/logical/applyparallelworker.c
> @@ -684,9 +684,9 @@ pa_process_spooled_messages_if_required(void)
>  if (fileset_state == FS_SERIALIZE_IN_PROGRESS)
>  {
>  pa_lock_stream(MyParallelShared->xid, AccessShareLock);
> -pa_unlock_stream(MyParallelShared->xid, AccessShareLock);
>
>  fileset_state = pa_get_fileset_state();
> +pa_unlock_stream(MyParallelShared->xid, AccessShareLock);
>  }
>
>  /*
>
Looking closer at the comment above this code and other part of the file,
it seems the order is intentional.

Please disregard my email about `pa_process_spooled_messages_if_required`.

Re: releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL

2023-01-10 Thread Ted Yu

On Tue, Jan 10, 2023 at 9:26 AM Ted Yu  wrote:

>
>
> On Tue, Jan 10, 2023 at 9:25 AM Ted Yu  wrote:
>
>> Hi,
>> I was reading src/backend/replication/logical/applyparallelworker.c .
>> In `pa_allocate_worker`, when pa_launch_parallel_worker returns NULL, I
>> think the `ParallelApplyTxnHash` should be released.
>>
>> Please see the patch.
>>
>> Thanks
>>
> Here is the patch :-)
>

In `pa_process_spooled_messages_if_required`, the `pa_unlock_stream` call
immediately follows `pa_lock_stream`.
I assume the following is the intended sequence of calls. If this is the
case, I can add it to the patch.

Cheers

diff --git a/src/backend/replication/logical/applyparallelworker.c
b/src/backend/replication/logical/applyparallelworker.c
index 2e5914d5d9..9879b3fff2 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -684,9 +684,9 @@ pa_process_spooled_messages_if_required(void)
 if (fileset_state == FS_SERIALIZE_IN_PROGRESS)
 {
 pa_lock_stream(MyParallelShared->xid, AccessShareLock);
-pa_unlock_stream(MyParallelShared->xid, AccessShareLock);

 fileset_state = pa_get_fileset_state();
+pa_unlock_stream(MyParallelShared->xid, AccessShareLock);
 }

 /*

Re: releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL

2023-01-10 Thread Ted Yu

On Tue, Jan 10, 2023 at 9:25 AM Ted Yu  wrote:

> Hi,
> I was reading src/backend/replication/logical/applyparallelworker.c .
> In `pa_allocate_worker`, when pa_launch_parallel_worker returns NULL, I
> think the `ParallelApplyTxnHash` should be released.
>
> Please see the patch.
>
> Thanks
>
Here is the patch :-)


destroy-parallel-apply-txn-hash-when-worker-not-launched.patch
Description: Binary data

releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL

2023-01-10 Thread Ted Yu

Hi,
I was reading src/backend/replication/logical/applyparallelworker.c .
In `pa_allocate_worker`, when pa_launch_parallel_worker returns NULL, I
think the `ParallelApplyTxnHash` should be released.

Please see the patch.

Thanks

Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-10 Thread Ted Yu

On Tue, Jan 10, 2023 at 8:26 AM Hayato Kuroda (Fujitsu) <
kuroda.hay...@fujitsu.com> wrote:

> Dear tom,
>
> > I think that it's a really bad idea to require postgres_fdw.sql
> > to have two expected-files: that will be a maintenance nightmare.
> > Please put whatever it is that needs a variant expected-file
> > into its own, hopefully very small and seldom-changed, test script.
> > Or rethink whether you really need a test case that has
> > platform-dependent output.
>
> Thank you for giving the suggestion. I agreed your saying and modifed that.
>
> I added new functions on the libpq and postgres-fdw layer that check
> whether the
> checking works well or not. In the test, at first, the platform is checked
> and
> the checking function is called only when it is supported.
>
> An alternative approach is that PQCanConncheck() can be combined with
> PQConncheck().
> This can reduce the libpq function, but we must define another returned
> value to
> the function like -2. I was not sure which approach was better.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>

Hi,

+   /* quick exit if connection cache has been not initialized yet. */

been not initialized -> not been initialized

+
(errcode(ERRCODE_CONNECTION_FAILURE),
+errmsg("could not connect
to server \"%s\"",

Currently each server which is not connected would log a warning.
Is it better to concatenate names for such servers and log one line ? This
would be cleaner when there are multiple such servers.

Cheers

[jira] [Created] (SPARK-41853) Use Map in place of SortedMap for ErrorClassesJsonReader

2023-01-02 Thread Ted Yu (Jira)

Ted Yu created SPARK-41853:
--

 Summary: Use Map in place of SortedMap for ErrorClassesJsonReader
 Key: SPARK-41853
 URL: https://issues.apache.org/jira/browse/SPARK-41853
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 3.2.3
Reporter: Ted Yu


The use of SortedMap in ErrorClassesJsonReader was mostly for making tests 
easier to write.

This PR replaces SortedMap with Map since SortedMap is slower compared to Map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41705) Move generate_protos.sh to dev/

2022-12-25 Thread Ted Yu (Jira)

Ted Yu created SPARK-41705:
--

 Summary: Move generate_protos.sh to dev/ 
 Key: SPARK-41705
 URL: https://issues.apache.org/jira/browse/SPARK-41705
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Ted Yu


connector/connect/dev only contains one script. Moving generate_protos.sh to 
dev follows practice for other scripts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: Error-safe user functions

2022-12-24 Thread Ted Yu

On Sat, Dec 24, 2022 at 4:38 AM Andrew Dunstan  wrote:

>
> On 2022-12-24 Sa 04:51, Ted Yu wrote:
> >
> >
> > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:
> >
> > Ted Yu  writes:
> > > In makeItemLikeRegex :
> >
> > > +   /* See regexp.c for explanation */
> > > +   CHECK_FOR_INTERRUPTS();
> > > +   pg_regerror(re_result, _tmp, errMsg,
> > > sizeof(errMsg));
> > > +   ereturn(escontext, false,
> >
> > > Since an error is returned, I wonder if the
> > `CHECK_FOR_INTERRUPTS` call is
> > > still necessary.
> >
> > Yes, it is.  We don't want a query-cancel transformed into a soft
> > error.
> >
> > regards, tom lane
> >
> > Hi,
> > For this case (`invalid regular expression`), the potential user
> > interruption is one reason for stopping execution.
> > I feel surfacing user interruption somehow masks the underlying error.
> >
> > The same regex, without user interruption, would exhibit an `invalid
> > regular expression` error.
> > I think it would be better to surface the error.
> >
> >
>
> All that this patch is doing is replacing a call to
> RE_compile_and_cache, which calls CHECK_FOR_INTERRUPTS, with similar
> code, which gives us the opportunity to call ereturn instead of ereport.
> Note that where escontext is NULL (the common case), ereturn functions
> identically to ereport. So unless you want to argue that the logic in
> RE_compile_and_cache is wrong I don't see what we're arguing about. If
> instead I had altered the API of RE_compile_and_cache to include an
> escontext parameter we wouldn't be having this argument at all. The only
> reason I didn't do that was the point Tom quite properly raised about
> why we're doing any caching here anyway.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com

Andrew:

Thanks for the response.

Re: Error-safe user functions

2022-12-24 Thread Ted Yu

On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:

> Ted Yu  writes:
> > In makeItemLikeRegex :
>
> > +   /* See regexp.c for explanation */
> > +   CHECK_FOR_INTERRUPTS();
> > +   pg_regerror(re_result, _tmp, errMsg,
> > sizeof(errMsg));
> > +   ereturn(escontext, false,
>
> > Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call
> is
> > still necessary.
>
> Yes, it is.  We don't want a query-cancel transformed into a soft error.
>
> regards, tom lane
>
Hi,
For this case (`invalid regular expression`), the potential user
interruption is one reason for stopping execution.
I feel surfacing user interruption somehow masks the underlying error.

The same regex, without user interruption, would exhibit an `invalid
regular expression` error.
I think it would be better to surface the error.

Cheers

Re: Error-safe user functions

2022-12-23 Thread Ted Yu

On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:

> Ted Yu  writes:
> > In makeItemLikeRegex :
>
> > +   /* See regexp.c for explanation */
> > +   CHECK_FOR_INTERRUPTS();
> > +   pg_regerror(re_result, _tmp, errMsg,
> > sizeof(errMsg));
> > +   ereturn(escontext, false,
>
> > Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call
> is
> > still necessary.
>
> Yes, it is.  We don't want a query-cancel transformed into a soft error.
>
> regards, tom lane
>
Hi,
`ereturn(escontext` calls appear in multiple places in the patch.
What about other callsites (w.r.t. checking interrupt) ?

Cheers

Re: Error-safe user functions

2022-12-23 Thread Ted Yu

On Fri, Dec 23, 2022 at 9:20 AM Andrew Dunstan  wrote:

>
> On 2022-12-22 Th 11:44, Tom Lane wrote:
> > Andrew Dunstan  writes:
> >> Yeah, I started there, but it's substantially more complex - unlike cube
> >> the jsonpath scanner calls the error routines as well as the parser.
> >> Anyway, here's a patch.
> > I looked through this and it seems generally OK.  A minor nitpick is
> > that we usually write "(Datum) 0" not "(Datum) NULL" for dont-care Datum
> > values.
>
>
> Fixed in the new version attached.
>
>
> > A slightly bigger issue is that makeItemLikeRegex still allows
> > an error to be thrown from RE_compile_and_cache if a bogus regex is
> > presented.  But that could be dealt with later.
>
>
> I'd rather fix it now while we're paying attention.
>
>
> >
> > (I wonder why this is using RE_compile_and_cache at all, really,
> > rather than some other API.  There doesn't seem to be value in
> > forcing the regex into the cache at this point.)
> >
> >
>
>
> I agree. The attached uses pg_regcomp instead. I had a lift a couple of
> lines from regexp.c, but not too many.
>
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com

Hi,
In makeItemLikeRegex :

+   /* See regexp.c for explanation */
+   CHECK_FOR_INTERRUPTS();
+   pg_regerror(re_result, _tmp, errMsg,
sizeof(errMsg));
+   ereturn(escontext, false,

Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call is
still necessary.

 Cheers

Re: Making Vars outer-join aware

2022-12-23 Thread Ted Yu

On Fri, Dec 23, 2022 at 10:21 AM Tom Lane  wrote:

> Here's a new edition of this patch series.
>
> I shoved some preliminary refactoring into the 0001 patch,
> notably splitting deconstruct_jointree into two passes.
> 0002-0009 cover the same ground as they did before, though
> with some differences in detail.  0010-0012 are new work
> mostly aimed at removing kluges we no longer need.
>
> There are two big areas that I would still like to improve, but
> I think we've run out of time to address them in the v16 cycle:
>
> * It'd be nice to apply the regular EquivalenceClass deduction
> mechanisms to outer-join equalities, instead of the
> reconsider_outer_join_clauses kluge.  I've made several stabs at that
> without much success.  I think that the "join domain" framework added
> in 0012 is likely to provide a workable foundation, but some more
> effort is needed.
>
> * I really want to get rid of RestrictInfo.is_pushed_down and
> RINFO_IS_PUSHED_DOWN(), because those seem exceedingly awkward
> and squishy.  I've not gotten far with finding a better
> replacement there, either.
>
> Despite the work being unfinished, I feel that this has moved us a
> long way towards outer-join handling being less of a jury-rigged
> affair.
>
> regards, tom lane
>
> Hi,
For v8-0012-invent-join-domains.patch, in `distribute_qual_to_rels`, it
seems that `pseudoconstant` and `root->hasPseudoConstantQuals` carry the
same value.
Can the variable `pseudoconstant` be omitted ?

Cheers

Re: checking snapshot argument for index_beginscan

2022-12-22 Thread Ted Yu

>
> Hi,
> I was looking at commit 941aa6a6268a6a66f6895401aad6b5329111d412 .
>
> I think it would be better to move the assertion into
> `index_beginscan_internal` because it is called by index_beginscan
> and index_beginscan_bitmap
>
> In the future, when a new variant of `index_beginscan_XX` is added, the
> assertion would be effective for the new variant.
>
> Please let me know what you think.
>
> Cheers
>


index-begin-scan-check-snapshot.patch
Description: Binary data

Re: Add sub-transaction overflow status in pg_stat_activity

2022-12-19 Thread Ted Yu

On Mon, Dec 19, 2022 at 11:57 AM Robert Haas  wrote:

> On Tue, Dec 13, 2022 at 2:29 AM Julien Rouhaud  wrote:
> > > > Makes sense.
> > >
> > > +1
> >
> > +1
>
> Committed with a bit more word-smithing on the documentation.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>
> Hi,
It seems the comment for `backend_subxact_overflowed` missed a word.

Please see the patch.

Thanks


subtxn-number-comment.patch
Description: Binary data

Fixing typo in 002_pg_dump.pl

2022-12-19 Thread Ted Yu

Hi,
I was going over 002_pg_dump.pl and saw a typo in pgdump_runs.

Please see the patch.

Thanks


pg-dump-comment.patch
Description: Binary data

Re: On login trigger: take three

2022-12-19 Thread Ted Yu

On Mon, Dec 19, 2022 at 1:40 AM Mikhail Gribkov  wrote:

> Hi Ted,
>
> > bq. in to the system
> > 'in to' -> into
> > bq. Any bugs in a trigger procedure
> > Any bugs -> Any bug
>
> Thanks!  Fixed typos in the attached v35.
>
> >   +   if (event == EVT_Login)
> >   +   dbgtag = CMDTAG_LOGIN;
> >   +   else
> >   +   dbgtag = CreateCommandTag(parsetree);
> > The same snippet appears more than once. It seems CMDTAG_LOGIN handling
> can be moved into `CreateCommandTag`.
>
> It appears twice in fact, both cases are nearby: in the main code and
> under the assert-checking ifdef.
> Moving CMDTAG_LOGIN to CreateCommandTag would change the CreateCommandTag
> function signature and ideology. CreateCommandTag finds a tag based on the
> command parse tree, while login event is a specific case when we do not
> have any command and the parse tree is NULL. Changing CreateCommandTag
> signature for these two calls looks a little bit overkill as it would lead
> to changing lots of other places the function is called from.
> We could move this snippet to a separate function, but here are the
> similar concerns I think: it would make sense for a more common or a more
> complex snippet, but not for two cases of if-else one-liners.
>
> --
>  best regards,
> Mikhail A. Gribkov
>
> e-mail: youzh...@gmail.com
> *http://www.flickr.com/photos/youzhick/albums
> *
> http://www.strava.com/athletes/5085772
> phone: +7(916)604-71-12
> Telegram: @youzhick
>
> Hi, Mikhail:
Thanks for the explanation.
It is Okay to keep the current formation of CMDTAG_LOGIN handling.

Cheers

Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2022-12-18 Thread Ted Yu

On Sun, Dec 18, 2022 at 3:30 PM Nathan Bossart 
wrote:

> Here is a new version of the patch.  Besides some cleanup, I added an index
> on reltoastrelid for the toast-to-main-relation lookup.  Before I bother
> adjusting the tests and documentation, I'm curious to hear thoughts on
> whether this seems like a viable approach.
>
> On Sat, Dec 17, 2022 at 04:39:29AM -0800, Ted Yu wrote:
> > +cluster_is_permitted_for_relation(Oid relid, Oid userid)
> > +{
> > +   return pg_class_aclcheck(relid, userid, ACL_MAINTAIN) ==
> > ACLCHECK_OK ||
> > +  has_parent_privs(relid, userid, ACL_MAINTAIN);
> >
> > Since the func only contains one statement, it seems this can be defined
> as
> > a macro instead.
>
> In the new version, there is a bit more to this function, so I didn't
> convert it to a macro.
>
> > +   List   *ancestors = get_partition_ancestors(relid);
> > +   Oid root = InvalidOid;
> >
> > nit: it would be better if the variable `root` can be aligned with
> variable
> > `ancestors`.
>
> Hm.  It looked alright on my machine.  In any case, I'll be sure to run
> pgindent at some point.  This patch is still in early stages.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com

Hi,

+ * Note: Because this function assumes that the realtion whose OID is
passed as

Typo:  realtion -> relation

Cheers

Re: Rework of collation code, extensibility

2022-12-17 Thread Ted Yu

On Sat, Dec 17, 2022 at 8:54 PM John Naylor 
wrote:

>
> > nul-terminate -> null-terminate
>
> NUL is a common abbreviation for the zero byte (but not for zero
> pointers). See the ascii manpage.
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>
> Ah.

`nul-terminated` does appear in the codebase.
Should have checked earlier.

Re: Rework of collation code, extensibility

2022-12-17 Thread Ted Yu

On Sat, Dec 17, 2022 at 7:14 PM Jeff Davis  wrote:

> Attached is a new patch series. I think there are enough changes that
> this has become more of a "rework" of the collation code rather than
> just a refactoring. This is a continuation of some prior work[1][2] in
> a new thread given its new scope.
>
> Benefits:
>
> 1. Clearer division of responsibilities.
> 2. More consistent between libc and ICU providers.
> 3. Hooks that allow extensions to replace collation provider libraries.
> 4. New tests for the collation provider library hooks.
>
> There are a lot of changes, and still some loose ends, but I believe a
> few of these patches are close to ready.
>
> This set of changes does not express an opinion on how we might want to
> support multiple provider libraries in core; but whatever we choose, it
> should be easier to accomplish. Right now, the hooks have limited
> information on which to make the choice for a specific version of a
> collation provider library, but that's because there's limited
> information in the catalog. If the discussion here[3] concludes in
> adding collation provider library or library version information to the
> catalog, we can add additional parameters to the hooks.
>
> [1]
>
> https://postgr.es/m/99aa79cceefd1fe84fda23510494b8fbb7ad1e70.ca...@j-davis.com
> [2]
>
> https://postgr.es/m/c4fda90ec6a7568a896f243a38eb273c3b5c3d93.ca...@j-davis.com
> [3]
>
> https://postgr.es/m/ca+hukgleqmhnpzrgacisoueyfgz8w6ewdhtk2h-4qn0iosf...@mail.gmail.com
>
>
> --
> Jeff Davis
> PostgreSQL Contributor Team - AWS
>
> Hi,
For pg_strxfrm_libc in v4-0002-Add-pg_strxfrm-and-pg_strxfrm_prefix.patch:

+#ifdef HAVE_LOCALE_T
+   if (locale)
+   return strxfrm_l(dest, src, destsize, locale->info.lt);
+   else
+#endif
+   return strxfrm(dest, src, destsize);

It seems the `else` is not needed (since when the if branch is taken, we
return from the func).

+   /* nul-terminate arguments */

nul-terminate -> null-terminate

For pg_strnxfrm(), I think `result` can be removed - we directly return the
result from pg_strnxfrm_libc or pg_strnxfrm_icu.

Cheers

Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2022-12-17 Thread Ted Yu

On Fri, Dec 16, 2022 at 10:04 PM Nathan Bossart 
wrote:

> On Thu, Dec 15, 2022 at 10:10:43AM -0800, Jeff Davis wrote:
> > The proposal to skip privilege checks for partitions would be
> > consistent with INSERT, SELECT, REINDEX that flow through to the
> > underlying partitions regardless of permissions/ownership (and even
> > RLS). It would be very minor behavior change on 15 for this weird case
> > of superuser-owned partitions, but I doubt anyone would be relying on
> > that.
>
> I've attached a work-in-progress patch that aims to accomplish this.
> Instead of skipping the privilege checks, I added logic to trawl through
> pg_inherits and pg_class to check whether the user has privileges for the
> partitioned table or for the main relation of a TOAST table.  This means
> that MAINTAIN on a partitioned table is enough to execute maintenance
> commands on all the partitions, and MAINTAIN on a main relation is enough
> to execute maintenance commands on its TOAST table.  Also, the maintenance
> commands that flow through to the partitions or the TOAST table should no
> longer error due to permissions when the user only has MAINTAIN on the
> paritioned table or main relation.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com

Hi,

+cluster_is_permitted_for_relation(Oid relid, Oid userid)
+{
+   return pg_class_aclcheck(relid, userid, ACL_MAINTAIN) ==
ACLCHECK_OK ||
+  has_parent_privs(relid, userid, ACL_MAINTAIN);

Since the func only contains one statement, it seems this can be defined as
a macro instead.

+   List   *ancestors = get_partition_ancestors(relid);
+   Oid root = InvalidOid;

nit: it would be better if the variable `root` can be aligned with variable
`ancestors`.

Cheers

Re: On login trigger: take three

2022-12-17 Thread Ted Yu

On Sat, Dec 17, 2022 at 3:46 AM Mikhail Gribkov  wrote:

> Hi Nikita,
>
> > Mikhail, I've checked the patch and previous discussion,
> > the condition mentioned earlier is still actual:
>
> Thanks for pointing this out! My bad, I forgot to fix the documentation
> example.
> Attached v34 has this issue fixed, as well as a couple other problems with
> the example code.
>
> --
>  best regards,
> Mikhail A. Gribkov
>
Hi,

bq. in to the system

'in to' -> into

bq. Any bugs in a trigger procedure

Any bugs -> Any bug

+   if (event == EVT_Login)
+   dbgtag = CMDTAG_LOGIN;
+   else
+   dbgtag = CreateCommandTag(parsetree);

The same snippet appears more than once. It seems CMDTAG_LOGIN handling can
be moved into `CreateCommandTag`.

Cheers

[jira] [Created] (SPARK-41419) [K8S] Decrement PVC_COUNTER when the pod deletion happens

2022-12-06 Thread Ted Yu (Jira)

Ted Yu created SPARK-41419:
--

 Summary: [K8S] Decrement PVC_COUNTER when the pod deletion happens 
 Key: SPARK-41419
 URL: https://issues.apache.org/jira/browse/SPARK-41419
 Project: Spark
  Issue Type: Task
  Components: Kubernetes
Affects Versions: 3.4.0
Reporter: Ted Yu


commit cc55de3 introduced PVC_COUNTER to track outstanding number of PVCs.

PVC_COUNTER should only be decremented when the pod deletion happens (in 
response to error).

If the PVC isn't created successfully (where PVC_COUNTER isn't incremented) 
(possibly due to execution not reaching resource(pvc).create() call), we 
shouldn't decrement the counter.
variable `success` tracks the progress of PVC creation:

value 0 means PVC is not created.
value 1 means PVC has been created.
value 2 means PVC has been created but due to subsequent error, the pod is 
deleted.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: Add tracking of backend memory allocated to pg_stat_activity

2022-11-27 Thread Ted Yu

On Sun, Nov 27, 2022 at 7:41 AM Justin Pryzby  wrote:

> On Sun, Nov 27, 2022 at 12:32:19AM -0500, Reid Thompson wrote:
> > @@ -32,6 +33,12 @@ typedef enum BackendState
> >   STATE_DISABLED
> >  } BackendState;
> >
> > +/* Enum helper for reporting memory allocated bytes */
> > +enum allocation_direction
> > +{
> > + DECREASE = -1,
> > + INCREASE = 1,
> > +};
>
> BTW, these should have some kind of prefix, like PG_ALLOC_* to avoid
> causing the same kind of problem for someone else that another header
> caused for you by defining something somewhere called IGNORE (ignore
> what, I don't know).  The other problem was probably due to a define,
> though.  Maybe instead of an enum, the function should take a boolean.
>
> I still wonder whether there needs to be a separate CF entry for the
> 0001 patch.  One issue is that there's two different lists of people
> involved in the threads.
>
> --
> Justin
>
>
> I am a bit curious: why is the allocation_direction enum needed ?

pgstat_report_allocated_bytes() can be given the amount (either negative or
positive) to adjust directly.

Cheers

Re: Add tracking of backend memory allocated to pg_stat_activity

2022-11-27 Thread Ted Yu

On Sat, Nov 26, 2022 at 9:32 PM Reid Thompson 
wrote:

> On Sat, 2022-11-26 at 22:10 -0500, Reid Thompson wrote:
> > Code rebased to current master.
> > Updated to incorporate additional recommendations from the the list
> >- add units to variables in view
> >- remove 'backend_' prefix from variables/functions
> >- update documentation
> >- add functional test for allocated_bytes
> >- refactor allocation reporting to reduce number of functions and
> >  branches/reduce performance hit
> >- zero allocated bytes after fork to avoid double counting
> > postmaster allocations
> >
> >
> >
> >
>
> attempt to remedy cfbot windows build issues
>
>
> Hi,

+   if (request_size > *mapped_size)
+   {
+   pgstat_report_allocated_bytes(*mapped_size,
DECREASE);
+   pgstat_report_allocated_bytes(request_size,
INCREASE);

pgstat_report_allocated_bytes is called twice for this case. Can the two
calls be combined into one (with request_size - *mapped_size, INCREASE) ?

Cheers

[jira] [Resolved] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0

2022-11-26 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-41274.

Resolution: Duplicate

This is dup of commit 02a2242a45062755bf7e20805958d5bdf1f5ed74

> Bump Kubernetes Client Version to 6.2.0
> ---
>
> Key: SPARK-41274
> URL: https://issues.apache.org/jira/browse/SPARK-41274
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Priority: Major
>
> Bump Kubernetes Client Version to 6.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41274) Bump Kubernetes Client Version to 6.2.0

2022-11-26 Thread Ted Yu (Jira)

Ted Yu created SPARK-41274:
--

 Summary: Bump Kubernetes Client Version to 6.2.0
 Key: SPARK-41274
 URL: https://issues.apache.org/jira/browse/SPARK-41274
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.4.0
Reporter: Ted Yu


Bump Kubernetes Client Version to 6.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: checking rd_rules in RelationBuildDesc

2022-11-25 Thread Ted Yu

On Fri, Nov 25, 2022 at 8:17 AM Tom Lane  wrote:

> Ted Yu  writes:
> > I wonder if we should check relation->rd_rules after the call
> > to RelationBuildRuleLock().
>
> That patch is both pointless and wrong.  There is some
> value in updating relhasrules in the catalog, so that future
> relcache loads don't uselessly call RelationBuildRuleLock;
> but we certainly can't try to do so right there.  That being
> the case, making the relcache be out of sync with what's on
> disk cannot have any good consequences.  The most likely
> effect is that it would block later logic from fixing things
> correctly.  There is logic in VACUUM to clean out obsolete
> relhasrules flags (see vac_update_relstats), but I suspect
> that would no longer work properly if we did this.
>
> regards, tom lane
>
Hi,
Thanks for evaluating the patch.

The change was originating from what we have in
RelationCacheInitializePhase3():

if (relation->rd_rel->relhasrules && relation->rd_rules ==
NULL)
{
RelationBuildRuleLock(relation);
if (relation->rd_rules == NULL)
relation->rd_rel->relhasrules = false;

FYI

checking rd_rules in RelationBuildDesc

2022-11-25 Thread Ted Yu

Hi,
(In light of commit 7b2ccc5e03bf16d1e1bbabca25298108c839ec52)

In RelationBuildDesc(), we have:

if (relation->rd_rel->relhasrules)
RelationBuildRuleLock(relation);

I wonder if we should check relation->rd_rules after the call
to RelationBuildRuleLock().

Your comment is appreciated.


build-desc-check-rules.patch
Description: Binary data

Re: indentation in _hash_pgaddtup()

2022-11-24 Thread Ted Yu

On Thu, Nov 24, 2022 at 12:31 PM Tom Lane  wrote:

> David Rowley  writes:
> > After running pgindent on v2, I see it still pushes the lines out
> > quite far. If I add a new line after PageGetItemId(page, and put the
> > variable assignment away from the variable declaration then it looks a
> > bit better. It's still 1 char over the limit.
>
> If you wanted to be hard-nosed about 80 character width, you could
> pull out the PageGetItemId call into a separate local variable.
> I wasn't going to be quite that picky, but I won't object if that
> seems better to you.
>
> regards, tom lane
>

Patch v4 stores ItemId in a local variable.


hash-pgaddtup-indent-v4.patch
Description: Binary data

Re: indentation in _hash_pgaddtup()

2022-11-24 Thread Ted Yu

On Thu, Nov 24, 2022 at 7:11 AM Tom Lane  wrote:

> Daniel Gustafsson  writes:
> >> On 24 Nov 2022, at 13:42, Ted Yu  wrote:
> >> In _hash_pgaddtup(), it seems the indentation is off for the assertion.
>
> > Indentation is handled by applying src/tools/pgindent to the code, and
> > re-running it on this file yields no re-indentation so this is in fact
> correct
> > according to the pgindent rules.
>
> It is one messy bit of code though --- perhaps a little more thought
> about where to put line breaks would help?  Alternatively, it could
> be split into multiple statements, along the lines of
>
> #ifdef USE_ASSERT_CHECKING
> if (PageGetMaxOffsetNumber(page) > 0)
> {
> IndexTuple lasttup = PageGetItem(page,
>  PageGetItemId(page,
>
>  PageGetMaxOffsetNumber(page)));
>
> Assert(_hash_get_indextuple_hashkey(lasttup) <=
>_hash_get_indextuple_hashkey(itup));
> }
> #endif
>
> (details obviously tweakable)
>
> regards, tom lane
>

Thanks Tom for the suggestion.

Here is patch v2.


hash-pgaddtup-indent-v2.patch
Description: Binary data

indentation in _hash_pgaddtup()

2022-11-24 Thread Ted Yu

Hi,
I was looking at :

commit d09dbeb9bde6b9faabd30e887eff4493331d6424
Author: David Rowley 
Date:   Thu Nov 24 17:21:44 2022 +1300

Speedup hash index builds by skipping needless binary searches

In _hash_pgaddtup(), it seems the indentation is off for the assertion.

Please take a look at the patch.

Thanks


hash-pgaddtup-indent.patch
Description: Binary data

Re: cleanup in open_auth_file

2022-11-23 Thread Ted Yu

On Wed, Nov 23, 2022 at 4:54 PM Ted Yu  wrote:

> Hi,
> I was looking at the following commit:
>
> commit efc981627a723d91e86865fb363d793282e473d1
> Author: Michael Paquier 
> Date:   Thu Nov 24 08:21:55 2022 +0900
>
> Rework memory contexts in charge of HBA/ident tokenization
>
> I think when the file cannot be opened, the context should be deleted.
>
> Please see attached patch.
>
> I also modified one comment where `deleted` would be more appropriate verb
> for the context.
>
> Cheers
>

Thinking more on this.
The context should be created when the file is successfully opened.

Please take a look at patch v2.


open-auth-file-cleanup-v2.patch
Description: Binary data

cleanup in open_auth_file

2022-11-23 Thread Ted Yu

Hi,
I was looking at the following commit:

commit efc981627a723d91e86865fb363d793282e473d1
Author: Michael Paquier 
Date:   Thu Nov 24 08:21:55 2022 +0900

Rework memory contexts in charge of HBA/ident tokenization

I think when the file cannot be opened, the context should be deleted.

Please see attached patch.

I also modified one comment where `deleted` would be more appropriate verb
for the context.

Cheers


open-auth-file-cleanup.patch
Description: Binary data

Re: [CAUTION!! freemail] Re: Partial aggregates pushdown

2022-11-22 Thread Ted Yu

On Tue, Nov 22, 2022 at 1:11 AM fujii.y...@df.mitsubishielectric.co.jp <
fujii.y...@df.mitsubishielectric.co.jp> wrote:

> Hi Mr.Yu.
>
> Thank you for comments.
>
> > + * Check that partial aggregate agg has compatibility
> >
> > If the `agg` refers to func parameter, the parameter name is aggform
> I fixed the above typo and made the above comment easy to understand
> New comment is "Check that partial aggregate function of aggform exsits in
> remote"
>
> > +   int32  partialagg_minversion = PG_VERSION_NUM;
> > +   if (aggform->partialagg_minversion ==
> > PARTIALAGG_MINVERSION_DEFAULT) {
> > +   partialagg_minversion = PG_VERSION_NUM;
> >
> >
> > I am curious why the same variable is assigned the same value twice. It
> seems
> > the if block is redundant.
> >
> > +   if ((fpinfo->server_version >= partialagg_minversion)) {
> > +   compatible = true;
> >
> >
> > The above can be simplified as: return fpinfo->server_version >=
> > partialagg_minversion;
> I fixed according to your comment.
>
> Sincerely yours,
> Yuuki Fujii
>
>
> Hi,
Thanks for the quick response.

Re: Partial aggregates pushdown

2022-11-21 Thread Ted Yu

On Mon, Nov 21, 2022 at 5:02 PM fujii.y...@df.mitsubishielectric.co.jp <
fujii.y...@df.mitsubishielectric.co.jp> wrote:

> Hi Mr.Vondra, Mr.Pyhalov, Everyone.
>
> I discussed with Mr.Pyhalov about the above draft by directly sending mail
> to
>  him(outside of pgsql-hackers). Mr.Pyhalov allowed me to update his patch
> along with the above draft. So I update Mr.Pyhalov's patch v10.
>
> I wrote my patch for discussion.
> My patch passes regression tests which contains additional basic
> postgres_fdw tests
> for my patch's feature. But my patch doesn't contain sufficient documents
> and tests.
> If reviewers accept my approach, I will add documents and tests to my
> patch.
>
> The following is a my patch's readme.
> # I simplified the above draft.
>
> --readme of my patch
> 1. interface
> 1) pg_aggregate
> There are the following additional columns.
> a) partialaggfn
>   data type: regproc.
>   default value: zero(means invalid).
>   description  : This field refers to the special aggregate function(then
> we call
>  this partialaggfunc)
> corresponding to aggregation function(then we call src) which has
> aggfnoid.
> partialaggfunc is used for partial aggregation pushdown by
> postgres_fdw.
> The followings are differences between the src and the special
> aggregate function.
>   difference1) result type
> The result type is same as the src's transtype if the src's
> transtype
> is not internal.
> Otherwise the result type is bytea.
>   difference2) final func
> The final func does not exist if the src's transtype is not
> internal.
> Otherwize the final func returns serialized value.
> For example, there is a partialaggfunc avg_p_int4 which corresponds to
> avg(int4)
> whose aggtranstype is _int4.
> The result value of avg_p_int4 is a float8 array which consists of
> count and
> summation. avg_p_int4 does not have finalfunc.
> For another example, there is a partialaggfunc avg_p_int8 which
> corresponds to
> avg(int8) whose aggtranstype is internal.
> The result value of avg_p_int8 is a bytea serialized array which
> consists of count
> and summation. avg_p_int8 has finalfunc int8_avg_serialize which is
> serialize function
> of avg(int8). This field is zero if there is no partialaggfunc.
>
> b) partialagg_minversion
>   data type: int4.
>   default value: zero(means current version).
>   description  : This field is the minimum PostgreSQL server version which
> has
> partialaggfunc. This field is used for checking compatibility of
> partialaggfunc.
>
> The above fields are valid in tuples for builtin avg, sum, min, max, count.
> There are additional records which correspond to partialaggfunc for avg,
> sum, min, max,
> count.
>
> 2) pg_proc
> There are additional records which correspond to partialaggfunc for avg,
> sum, min, max,
> count.
>
> 3) postgres_fdw
> postgres_fdw has an additional foreign server option server_version.
> server_version is
> integer value which means remote server version number. Default value of
> server_version
> is zero. server_version is used for checking compatibility of
> partialaggfunc.
>
> 2. feature
> postgres_fdw can pushdown partial aggregation of avg, sum, min, max, count.
> Partial aggregation pushdown is fine when the following two conditions are
> both true.
>   condition1) partialaggfn is valid.
>   condition2) server_version is not less than partialagg_minversion
> postgres_fdw executes pushdown the patialaggfunc instead of a src.
> For example, we issue "select avg_p_int4(c) from t" instead of "select
> avg(c) from t"
> in the above example.
>
> postgres_fdw can pushdown every aggregate function which supports partial
> aggregation
> if you add a partialaggfunc corresponding to the aggregate function by
> create aggregate
> command.
>
> 3. difference between my patch and Mr.Pyhalov's v10 patch.
> 1) In my patch postgres_fdw can pushdown partial aggregation of avg
> 2) In my patch postgres_fdw can pushdown every aggregate function which
> supports partial
>   aggregation if you add a partialaggfunc corresponding to the aggregate
> function.
>
> 4. sample commands in psql
> \c postgres
> drop database tmp;
> create database tmp;
> \c tmp
> create extension postgres_fdw;
> create server server_01 foreign data wrapper postgres_fdw options(host
> 'localhost', dbname 'tmp', server_version '16', async_capable 'true');
> create user mapping for postgres server server_01 options(user 'postgres',
> password 'postgres');
> create server server_02 foreign data wrapper postgres_fdw options(host
> 'localhost', dbname 'tmp', server_version '16', async_capable 'true');
> create user mapping for postgres server server_02 options(user 'postgres',
> password 'postgres');
>
> create table t(dt timestamp, id int4, name text, total int4, val float4,
> type int4, span interval) partition by list (type);
>
> create table t1(dt timestamp, id int4, name text, total

Re: Getting rid of SQLValueFunction

2022-11-20 Thread Ted Yu

On Sun, Nov 20, 2022 at 3:12 PM Michael Paquier  wrote:

> On Sun, Nov 20, 2022 at 08:21:10AM -0800, Ted Yu wrote:
> > For get_func_sql_syntax(), the code for cases
> > of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP
> is
> > mostly the same.
> > Maybe we can introduce a helper so that code duplication is reduced.
>
> It would.  Thanks for the suggestion.
>
> Do you like something like the patch 0002 attached?  This reduces a
> bit the overall size of the patch.  Both ought to be merged in the
> same commit, still it is easier to see the simplification created.
> --
> Michael
>
Hi,
Thanks for the quick response.

+ * timestamp.  These require a specific handling with their typmod is given
+ * by the function caller through their SQL keyword.

typo: typmod is given -> typmod given

Other than the above, code looks good to me.

Cheers

Re: Getting rid of SQLValueFunction

2022-11-20 Thread Ted Yu

On Sat, Nov 19, 2022 at 7:01 PM Michael Paquier  wrote:

> On Fri, Nov 18, 2022 at 10:23:58AM +0900, Michael Paquier wrote:
> > Please note that in order to avoid tweaks when choosing the attribute
> > name of function call, this needs a total of 8 new catalog functions
> > mapping to the SQL keywords, which is what the test added by 2e0d80c
> > is about:
> > - current_role
> > - user
> > - current_catalog
> > - current_date
> > - current_time
> > - current_timestamp
> > - localtime
> > - localtimestamp
> >
> > Any objections?
>
> Hearing nothing, I have gone through 0001 again and applied it as
> fb32748 to remove the dependency between names and SQLValueFunction.
> Attached is 0002, to bring back the CI to a green state.
> --
> Michael
>

Hi,
For get_func_sql_syntax(), the code for cases
of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP is
mostly the same.
Maybe we can introduce a helper so that code duplication is reduced.

Cheers

[jira] [Created] (SPARK-41197) Upgrade Kafka version to 3.3 release

2022-11-18 Thread Ted Yu (Jira)

Ted Yu created SPARK-41197:
--

 Summary: Upgrade Kafka version to 3.3 release
 Key: SPARK-41197
 URL: https://issues.apache.org/jira/browse/SPARK-41197
 Project: Spark
  Issue Type: Improvement
  Components: Java API
Affects Versions: 3.3.1
Reporter: Ted Yu


Kafka 3.3 has been released.

This issue upgrades Kafka dependency to 3.3 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

redundant check of msg in does_not_exist_skipping

2022-11-17 Thread Ted Yu

Hi,
I was looking at commit aca992040951c7665f1701cd25d48808eda7a809

I think the check of msg after the switch statement is not necessary. The
variable msg is used afterward.
If there is (potential) missing case in switch statement, the compiler
would warn.

How about removing the check ?

Thanks

diff --git a/src/backend/commands/dropcmds.c
b/src/backend/commands/dropcmds.c
index db906f530e..55996940eb 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -518,9 +518,6 @@ does_not_exist_skipping(ObjectType objtype, Node
*object)

 /* no default, to let compiler warn about missing case */
 }
-if (!msg)
-elog(ERROR, "unrecognized object type: %d", (int) objtype);
-
 if (!args)
 ereport(NOTICE, (errmsg(msg, name)));
 else

Re: closing file in adjust_data_dir

2022-11-16 Thread Ted Yu

On Wed, Nov 16, 2022 at 12:28 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 16.11.22 04:31, Ted Yu wrote:
> > On Wed, 16 Nov 2022 at 11:15, Ted Yu  > <mailto:yuzhih...@gmail.com>> wrote:
> >  > On Tue, Nov 15, 2022 at 7:12 PM Japin Li  > <mailto:japi...@hotmail.com>> wrote:
> >  >> After some rethinking, I find the origin code do not have
> problems.
> >  >>
> >  >> If fd is NULL or fgets() returns NULL, the process exits.
> > Otherwise, we
> >  >> call
> >  >> pclose() to close fd.  The code isn't straightforward, however,
> > it is
> >  >> correct.
> >
> > Hi,
> > Please take a look at the following:
> >
> > https://en.cppreference.com/w/c/io/fgets
> > <https://en.cppreference.com/w/c/io/fgets>
> > Quote: If the failure has been caused by some other error, sets the
> > /error/ indicator (see ferror()
> > <https://en.cppreference.com/w/c/io/ferror>) on |stream|. The contents
> > of the array pointed to by |str| are indeterminate (it may not even be
> > null-terminated).
>
> That has nothing to do with the return value of fgets().
>
> Hi, Peter:
Here is how the return value from pclose() is handled in other places:

+   if (pclose_rc != 0)
+   {
+   ereport(ERROR,

The above is very easy to understand.
While the check in `adjust_data_dir` is somewhat harder to comprehend.

I think the formation presented in patch v1 aligns with existing checks of
the return value from pclose().
It also gives a unique error message in the case that the return value from
pclose() indicates an error.

Cheers

Re: closing file in adjust_data_dir

2022-11-15 Thread Ted Yu

On Tue, Nov 15, 2022 at 7:26 PM Japin Li  wrote:

>
> On Wed, 16 Nov 2022 at 11:15, Ted Yu  wrote:
> > On Tue, Nov 15, 2022 at 7:12 PM Japin Li  wrote:
> >> After some rethinking, I find the origin code do not have problems.
> >>
> >> If fd is NULL or fgets() returns NULL, the process exits.  Otherwise, we
> >> call
> >> pclose() to close fd.  The code isn't straightforward, however, it is
> >> correct.
>
> Hi,
Please take a look at the following:

https://en.cppreference.com/w/c/io/fgets

Quote: If the failure has been caused by some other error, sets the
*error* indicator
(see ferror() <https://en.cppreference.com/w/c/io/ferror>) on stream. The
contents of the array pointed to by str are indeterminate (it may not even
be null-terminated).

I think we shouldn't assume that the fd doesn't need to be closed when NULL
is returned from fgets().

Cheers

Re: closing file in adjust_data_dir

2022-11-15 Thread Ted Yu

On Tue, Nov 15, 2022 at 7:12 PM Japin Li  wrote:

>
> On Wed, 16 Nov 2022 at 10:52, Ted Yu  wrote:
> > On Tue, Nov 15, 2022 at 6:35 PM Japin Li  wrote:
> >>
> >> fd = popen(cmd, "r");
> >> -   if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL
> ||
> >> pclose(fd) != 0)
> >> +   if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL)
> >> {
> >> +   pclose(fd);
> >> write_stderr(_("%s: could not determine the data
> directory
> >> using command \"%s\"\n"), progname, cmd);
> >> exit(1);
> >> }
> >>
> >> Here, segfault maybe occurs if fd is NULL.  I think we can remove
> pclose()
> >> safely since the process will exit.
> >>
> >
> > That means we're going back to v1 of the patch.
> >
>
> After some rethinking, I find the origin code do not have problems.
>
> If fd is NULL or fgets() returns NULL, the process exits.  Otherwise, we
> call
> pclose() to close fd.  The code isn't straightforward, however, it is
> correct.
>
>
>
> Please read this sentence from my first post:

If the fgets() call doesn't return NULL, the pclose() would be skipped.

Re: closing file in adjust_data_dir

2022-11-15 Thread Ted Yu

On Tue, Nov 15, 2022 at 6:35 PM Japin Li  wrote:

>
> On Wed, 16 Nov 2022 at 10:06, Ted Yu  wrote:
> >> Hi,
> > That check is a few line above:
> >
> > +   if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL)
> > {
> >
> > Cheers
>
> Thanks for the explanation.  Comment on v2 patch.
>
> fd = popen(cmd, "r");
> -   if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL ||
> pclose(fd) != 0)
> +   if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL)
> {
> +   pclose(fd);
> write_stderr(_("%s: could not determine the data directory
> using command \"%s\"\n"), progname, cmd);
> exit(1);
> }
>
> Here, segfault maybe occurs if fd is NULL.  I think we can remove pclose()
> safely since the process will exit.
>
> --
> Regrads,
> Japin Li.
> ChengDu WenWu Information Technology Co.,Ltd.
>

That means we're going back to v1 of the patch.

Cheers

Re: closing file in adjust_data_dir

2022-11-15 Thread Ted Yu

On Tue, Nov 15, 2022 at 6:02 PM Japin Li  wrote:

>
> On Wed, 16 Nov 2022 at 02:43, Ted Yu  wrote:
> > Hi,
> > I was looking at the commit:
> >
> > commit 2fe3bdbd691a5d11626308e7d660440be6c210c8
> > Author: Peter Eisentraut 
> > Date:   Tue Nov 15 15:35:37 2022 +0100
> >
> > Check return value of pclose() correctly
> >
> > In src/bin/pg_ctl/pg_ctl.c :
> >
> > if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL ||
> > pclose(fd) != 0)
> >
> > If the fgets() call doesn't return NULL, the pclose() would be skipped.
> > Since the original pclose() call was removed, wouldn't this lead to fd
> > leaking ?
> >
> > Please see attached patch for my proposal.
> >
> > Cheers
>
> I think we should check whether fd is NULL or not, otherwise, segmentation
> fault maybe occur.
>
> +   if (pclose(fd) != 0)
> +   {
> +   write_stderr(_("%s: could not close the file following
> command \"%s\"\n"), progname, cmd);
> +   exit(1);
> +   }
>
> Hi,
That check is a few line above:

+   if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL)
{

Cheers

Re: closing file in adjust_data_dir

2022-11-15 Thread Ted Yu

On Tue, Nov 15, 2022 at 10:43 AM Ted Yu  wrote:

> Hi,
> I was looking at the commit:
>
> commit 2fe3bdbd691a5d11626308e7d660440be6c210c8
> Author: Peter Eisentraut 
> Date:   Tue Nov 15 15:35:37 2022 +0100
>
> Check return value of pclose() correctly
>
> In src/bin/pg_ctl/pg_ctl.c :
>
> if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL ||
> pclose(fd) != 0)
>
> If the fgets() call doesn't return NULL, the pclose() would be skipped.
> Since the original pclose() call was removed, wouldn't this lead to fd
> leaking ?
>
> Please see attached patch for my proposal.
>
> Cheers
>

There was potential leak of fd in patch v1.

Please take a look at patch v2.

Thanks


pg-ctl-close-fd-v2.patch
Description: Binary data

closing file in adjust_data_dir

2022-11-15 Thread Ted Yu

Hi,
I was looking at the commit:

commit 2fe3bdbd691a5d11626308e7d660440be6c210c8
Author: Peter Eisentraut 
Date:   Tue Nov 15 15:35:37 2022 +0100

Check return value of pclose() correctly

In src/bin/pg_ctl/pg_ctl.c :

if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL ||
pclose(fd) != 0)

If the fgets() call doesn't return NULL, the pclose() would be skipped.
Since the original pclose() call was removed, wouldn't this lead to fd
leaking ?

Please see attached patch for my proposal.

Cheers


pg-ctl-close-fd.patch
Description: Binary data

[jira] [Updated] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning

2022-09-20 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-40508:
---
Description: 
When running spark application against spark 3.3, I see the following :
{code}
java.lang.IllegalArgumentException: Unsupported data source V2 partitioning 
type: CustomPartitioning
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46)
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
{code}
The CustomPartitioning works fine with Spark 3.2.1
This PR proposes to relax the code and treat all unknown partitioning the same 
way as that for UnknownPartitioning.

  was:
When running spark application against spark 3.3, I see the following :
```
java.lang.IllegalArgumentException: Unsupported data source V2 partitioning 
type: CustomPartitioning
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46)
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
```
The CustomPartitioning works fine with Spark 3.2.1
This PR proposes to relax the code and treat all unknown partitioning the same 
way as that for UnknownPartitioning.


> Treat unknown partitioning as UnknownPartitioning
> -
>
> Key: SPARK-40508
> URL: https://issues.apache.org/jira/browse/SPARK-40508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Ted Yu
>Priority: Major
>
> When running spark application against spark 3.3, I see the following :
> {code}
> java.lang.IllegalArgumentException: Unsupported data source V2 partitioning 
> type: CustomPartitioning
> at 
> org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46)
> at 
> org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
> {code}
> The CustomPartitioning works fine with Spark 3.2.1
> This PR proposes to relax the code and treat all unknown partitioning the 
> same way as that for UnknownPartitioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning

2022-09-20 Thread Ted Yu (Jira)

Ted Yu created SPARK-40508:
--

 Summary: Treat unknown partitioning as UnknownPartitioning
 Key: SPARK-40508
 URL: https://issues.apache.org/jira/browse/SPARK-40508
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Ted Yu


When running spark application against spark 3.3, I see the following :
```
java.lang.IllegalArgumentException: Unsupported data source V2 partitioning 
type: CustomPartitioning
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:46)
at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioning$$anonfun$apply$1.applyOrElse(V2ScanPartitioning.scala:34)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
```
The CustomPartitioning works fine with Spark 3.2.1
This PR proposes to relax the code and treat all unknown partitioning the same 
way as that for UnknownPartitioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-39769) Rename trait Unevaluable

2022-07-13 Thread Ted Yu (Jira)

Ted Yu created SPARK-39769:
--

 Summary: Rename trait Unevaluable
 Key: SPARK-39769
 URL: https://issues.apache.org/jira/browse/SPARK-39769
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Ted Yu


I came upon `trait Unevaluable` which is defined in 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala

Unevaluable is not a word.

There are `valuable`, `invaluable` but I have never seen Unevaluable.

This issue renames the trait to Unevaluatable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38998) Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running

2022-04-22 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-38998:
---
Description: 
Sometimes the following exception is observed:
{code}
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
{code}
It seems somehow the MetricsSystem was stopped in between start() and 
getServletHandlers() calls.
SparkContext should check the MetricsSystem becomes running before calling 
getServletHandlers()

  was:
Sometimes the following exception is observed:
{code}
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
{code}
It seems SparkContext should wait till the MetricsSystem becomes running before 
calling getServletHandlers()


> Call to MetricsSystem#getServletHandlers() may take place before 
> MetricsSystem becomes running
> --
>
> Key: SPARK-38998
> URL: https://issues.apache.org/jira/browse/SPARK-38998
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Ted Yu
>Priority: Major
>
> Sometimes the following exception is observed:
> {code}
> java.lang.IllegalArgumentException: requirement failed: Can only call 
> getServletHandlers on a running MetricsSystem
> at scala.Predef$.require(Predef.scala:281)
> at 
> org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
> at org.apache.spark.SparkContext.(SparkContext.scala:597)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> {code}
> It seems somehow the MetricsSystem was stopped in between start() and 
> getServletHandlers() calls.
> SparkContext should check the MetricsSystem becomes running before calling 
> getServletHandlers()



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38998) Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running

2022-04-22 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-38998:
---
Description: 
Sometimes the following exception is observed:
{code}
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
{code}
It seems SparkContext should wait till the MetricsSystem becomes running before 
calling getServletHandlers()

  was:
Sometimes the following exception is observed:
```
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
```
It seems SparkContext should wait till the MetricsSystem becomes running before 
calling getServletHandlers()


> Call to MetricsSystem#getServletHandlers() may take place before 
> MetricsSystem becomes running
> --
>
> Key: SPARK-38998
> URL: https://issues.apache.org/jira/browse/SPARK-38998
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Ted Yu
>Priority: Major
>
> Sometimes the following exception is observed:
> {code}
> java.lang.IllegalArgumentException: requirement failed: Can only call 
> getServletHandlers on a running MetricsSystem
> at scala.Predef$.require(Predef.scala:281)
> at 
> org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
> at org.apache.spark.SparkContext.(SparkContext.scala:597)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> {code}
> It seems SparkContext should wait till the MetricsSystem becomes running 
> before calling getServletHandlers()



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38998) Call to MetricsSystem#getServletHandlers() may take place before MetricsSystem becomes running

2022-04-22 Thread Ted Yu (Jira)

Ted Yu created SPARK-38998:
--

 Summary: Call to MetricsSystem#getServletHandlers() may take place 
before MetricsSystem becomes running
 Key: SPARK-38998
 URL: https://issues.apache.org/jira/browse/SPARK-38998
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2
Reporter: Ted Yu


Sometimes the following exception is observed:
```
java.lang.IllegalArgumentException: requirement failed: Can only call 
getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at 
org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.(SparkContext.scala:597)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
```
It seems SparkContext should wait till the MetricsSystem becomes running before 
calling getServletHandlers()



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'

2021-02-24 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290646#comment-17290646
 ] 

Ted Yu commented on SPARK-34532:


Included the test command and some more information in the description.

You should see these errors when you run the command.

> IntervalUtils.add() may result in 'long overflow'
> -
>
> Key: SPARK-34532
> URL: https://issues.apache.org/jira/browse/SPARK-34532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2
>Reporter: Ted Yu
>Priority: Major
>
> I noticed the following when running test suite:
> build/sbt "sql/testOnly *SQLQueryTestSuite"
> {code}
> 19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage 
> 6416.0 failed 1 times; aborting job
> [info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds)
> 19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 
> in stage 6476.0 (TID 7789)
> java.lang.ArithmeticException: long overflow
> at java.lang.Math.multiplyExact(Math.java:892)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> {code}
> {code}
> 19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 
> in stage 14744.0 (TID 16705)
> java.lang.ArithmeticException: long overflow
> at java.lang.Math.addExact(Math.java:809)
> at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105)
> at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104)
> at 
> org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268)
> at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573)
> at 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97)
> {code}
> This likely was caused by the following line:
> {code}
> val microseconds = left.microseconds + right.microseconds
> {code}
> We should check whether the addition would produce overflow before adding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'

2021-02-24 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-34532:
---
Description: 
I noticed the following when running test suite:

build/sbt "sql/testOnly *SQLQueryTestSuite"
{code}
19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage 
6416.0 failed 1 times; aborting job
[info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds)
19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 in 
stage 6476.0 (TID 7789)
java.lang.ArithmeticException: long overflow
at java.lang.Math.multiplyExact(Math.java:892)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
{code}
{code}
19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 14744.0 (TID 16705)
java.lang.ArithmeticException: long overflow
at java.lang.Math.addExact(Math.java:809)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97)
{code}
This likely was caused by the following line:
{code}
val microseconds = left.microseconds + right.microseconds
{code}
We should check whether the addition would produce overflow before adding.

  was:
I noticed the following when running test suite:
{code}
19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 14744.0 (TID 16705)
java.lang.ArithmeticException: long overflow
at java.lang.Math.addExact(Math.java:809)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97)
{code}
This likely was caused by the following line:
{code}
val microseconds = left.microseconds + right.microseconds
{code}
We should check whether the addition would produce overflow before adding.


> IntervalUtils.add() may result in 'long overflow'
> -
>
> Key: SPARK-34532
> URL: https://issues.apache.org/jira/browse/SPARK-34532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2
>Reporter: Ted Yu
>Priority: Major
>
> I noticed the following when running test suite:
> build/sbt "sql/testOnly *SQLQueryTestSuite"
> {code}
> 19:10:17.977 ERROR org.apache.spark.scheduler.TaskSetManager: Task 1 in stage 
> 6416.0 failed 1 times; aborting job
> [info] - postgreSQL/int4.sql (2 seconds, 543 milliseconds)
> 19:10:20.994 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 
> in stage 6476.0 (TID 7789)
> java.lang.ArithmeticException: long overflow
> at java.lang.Math.multiplyExact(Math.java:892)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodege

[jira] [Created] (SPARK-34532) IntervalUtils.add() may result in 'long overflow'

2021-02-24 Thread Ted Yu (Jira)

Ted Yu created SPARK-34532:
--

 Summary: IntervalUtils.add() may result in 'long overflow'
 Key: SPARK-34532
 URL: https://issues.apache.org/jira/browse/SPARK-34532
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.2
Reporter: Ted Yu


I noticed the following when running test suite:
{code}
19:15:38.255 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 14744.0 (TID 16705)
java.lang.ArithmeticException: long overflow
at java.lang.Math.addExact(Math.java:809)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:105)
at org.apache.spark.sql.types.LongExactNumeric$.plus(numerics.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.Add.nullSafeEval(arithmetic.scala:268)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:573)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:97)
{code}
This likely was caused by the following line:
{code}
val microseconds = left.microseconds + right.microseconds
{code}
We should check whether the addition would produce overflow before adding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-19 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17287403#comment-17287403
 ] 

Ted Yu commented on SPARK-34476:


The basic jsonb test is here:
https://github.com/yugabyte/yugabyte-db/blob/master/java/yb-cql-4x/src/test/java/org/yb/loadtest/TestSpark3Jsonb.java

I am working on adding get_json_string() function (via Spark extension) which 
is similar to get_json_object() but expands the last jsonb field using '->>' 
instead of '->'.

> Duplicate referenceNames are given for ambiguousReferences
> --
>
> Key: SPARK-34476
> URL: https://issues.apache.org/jira/browse/SPARK-34476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ted Yu
>Priority: Major
>
> When running test with Spark extension that converts custom function to json 
> path expression, I saw the following in test output:
> {code}
> 2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
> org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is 
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Reference 
> 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
> {code}
> Please note the candidates following 'could be' are the same.
> Here is the physical plan for a working query where phone is a jsonb column:
> {code}
> TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
> output=[id#6,address#7,key#0])
> +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
>+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
> Scan: test.person
>  - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
>  - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
> {code}
> The difference for the failed query is that it tries to use 
> {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
> part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-19 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-34476:
---
Description: 
When running test with Spark extension that converts custom function to json 
path expression, I saw the following in test output:
{code}
2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == 
Physical Plan ==
org.apache.spark.sql.AnalysisException: Reference 
'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
{code}
Please note the candidates following 'could be' are the same.
Here is the physical plan for a working query where phone is a jsonb column:
{code}
TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
output=[id#6,address#7,key#0])
+- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
   +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
Scan: test.person
 - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
 - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
{code}
The difference for the failed query is that it tries to use 
{code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
part of filter).

  was:
When running test with Spark extension that converts custom function to json 
path expression, I saw the following in test output:
{code}
2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == 
Physical Plan ==
org.apache.spark.sql.AnalysisException: Reference 
'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
{code}
Please note the candidates following 'could be' are the same.
Here is the physical plan for a working query where phone is a jsonb column:
{code}
TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
output=[id#6,address#7,key#0])
+- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
   +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
Scan: test.person
 - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
 - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
{code}
The difference for the failed query is that it tries to use 
phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter).


> Duplicate referenceNames are given for ambiguousReferences
> ----------
>
> Key: SPARK-34476
> URL: https://issues.apache.org/jira/browse/SPARK-34476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ted Yu
>Priority: Major
>
> When running test with Spark extension that converts custom function to json 
> path expression, I saw the following in test output:
> {code}
> 2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
> org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is 
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Reference 
> 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
> {code}
> Please note the candidates following 'could be' are the same.
> Here is the physical plan for a working query where phone is a jsonb column:
> {code}
> TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
> output=[id#6,address#7,key#0])
> +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
>+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
> Scan: test.person
>  - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
>  - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
> {code}
> The difference for the failed query is that it tries to use 
> {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
> part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-19 Thread Ted Yu (Jira)

Ted Yu created SPARK-34476:
--

 Summary: Duplicate referenceNames are given for ambiguousReferences
 Key: SPARK-34476
 URL: https://issues.apache.org/jira/browse/SPARK-34476
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Ted Yu


When running test with Spark extension that converts custom function to json 
path expression, I saw the following in test output:
{code}
2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is == 
Physical Plan ==
org.apache.spark.sql.AnalysisException: Reference 
'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
{code}
Please note the candidates following 'could be' are the same.
Here is the physical plan for a working query where phone is a jsonb column:
{code}
TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
output=[id#6,address#7,key#0])
+- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
   +- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
Scan: test.person
 - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
 - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
{code}
The difference for the failed query is that it tries to use 
phone->'key'->1->'m'->2->>'b' in the projection (which works as part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34017) Pass json column information via pruneColumns()

2021-01-05 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259423#comment-17259423
 ] 

Ted Yu commented on SPARK-34017:


For PushDownUtils#pruneColumns, I am experimenting with the following:
{code}
  case r: SupportsPushDownRequiredColumns if 
SQLConf.get.nestedSchemaPruningEnabled =>
val JSONCapture = "get_json_object\\((.*), *(.*)\\)".r
var jsonRootFields : ArrayBuffer[RootField] = ArrayBuffer()
projects.map{ _.map{ f => f.toString match {
  case JSONCapture(column, field) =>
jsonRootFields += RootField(StructField(column, f.dataType, 
f.nullable),
  derivedFromAtt = false, prunedIfAnyChildAccessed = true)
  case _ => logDebug("else " + f)
}}}
val rootFields = SchemaPruning.identifyRootFields(projects, filters) ++ 
jsonRootFields
{code}

> Pass json column information via pruneColumns()
> ---
>
> Key: SPARK-34017
> URL: https://issues.apache.org/jira/browse/SPARK-34017
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushDownUtils#pruneColumns only passes root fields to 
> SupportsPushDownRequiredColumns implementation(s).
> {code}
> 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> projection List(id#33, address#34, phone#36, get_json_object(phone#36, 
> $.code) AS get_json_object(phone, $.code)#37)
> 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> StructType(StructField(id,IntegerType,false), 
> StructField(address,StringType,true), StructField(phone,StringType,true))
> {code}
> The first line shows projections and the second line shows the pruned schema.
> We can see that get_json_object(phone#36, $.code) is filtered. This 
> expression retrieves field 'code' from phone json column.
> We should allow json column information to be passed via pruneColumns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34017) Pass json column information via pruneColumns()

2021-01-05 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-34017:
---
Description: 
Currently PushDownUtils#pruneColumns only passes root fields to 
SupportsPushDownRequiredColumns implementation(s).
{code}
2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) 
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
StructType(StructField(id,IntegerType,false), 
StructField(address,StringType,true), StructField(phone,StringType,true))
{code}
The first line shows projections and the second line shows the pruned schema.

We can see that get_json_object(phone#36, $.code) is filtered. This expression 
retrieves field 'code' from phone json column.

We should allow json column information to be passed via pruneColumns().

  was:

Currently PushDownUtils#pruneColumns only passes root fields to 
SupportsPushDownRequiredColumns implementation(s).

2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) 
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
StructType(StructField(id,IntegerType,false), 
StructField(address,StringType,true), StructField(phone,StringType,true))

The first line shows projections and the second line shows the pruned schema.

We can see that get_json_object(phone#36, $.code) is filtered. This expression 
retrieves field 'code' from phone json column.

We should allow json column information to be passed via pruneColumns().


> Pass json column information via pruneColumns()
> ---
>
> Key: SPARK-34017
> URL: https://issues.apache.org/jira/browse/SPARK-34017
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushDownUtils#pruneColumns only passes root fields to 
> SupportsPushDownRequiredColumns implementation(s).
> {code}
> 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> projection List(id#33, address#34, phone#36, get_json_object(phone#36, 
> $.code) AS get_json_object(phone, $.code)#37)
> 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
> org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
> StructType(StructField(id,IntegerType,false), 
> StructField(address,StringType,true), StructField(phone,StringType,true))
> {code}
> The first line shows projections and the second line shows the pruned schema.
> We can see that get_json_object(phone#36, $.code) is filtered. This 
> expression retrieves field 'code' from phone json column.
> We should allow json column information to be passed via pruneColumns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34017) Pass json column information via pruneColumns()

2021-01-05 Thread Ted Yu (Jira)

Ted Yu created SPARK-34017:
--

 Summary: Pass json column information via pruneColumns()
 Key: SPARK-34017
 URL: https://issues.apache.org/jira/browse/SPARK-34017
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.1
Reporter: Ted Yu



Currently PushDownUtils#pruneColumns only passes root fields to 
SupportsPushDownRequiredColumns implementation(s).

2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) 
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
StructType(StructField(id,IntegerType,false), 
StructField(address,StringType,true), StructField(phone,StringType,true))

The first line shows projections and the second line shows the pruned schema.

We can see that get_json_object(phone#36, $.code) is filtered. This expression 
retrieves field 'code' from phone json column.

We should allow json column information to be passed via pruneColumns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33997) Running bin/spark-sql gives NoSuchMethodError

2021-01-04 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-33997.

Resolution: Cannot Reproduce

Rebuilt Spark locally and the error was gone.

> Running bin/spark-sql gives NoSuchMethodError
> -
>
> Key: SPARK-33997
> URL: https://issues.apache.org/jira/browse/SPARK-33997
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> I ran 'mvn install -Phive -Phive-thriftserver -DskipTests'
> Running bin/spark-sql gives the following error:
> {code}
> 21/01/05 00:06:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.spark.sql.internal.SharedState$.loadHiveConfFile$default$3()Lscala/collection/Map;
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:136)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934)
> {code}
> Scala version 2.12.10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33997) Running bin/spark-sql gives NoSuchMethodError

2021-01-04 Thread Ted Yu (Jira)

Ted Yu created SPARK-33997:
--

 Summary: Running bin/spark-sql gives NoSuchMethodError
 Key: SPARK-33997
 URL: https://issues.apache.org/jira/browse/SPARK-33997
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: Ted Yu


I ran 'mvn install -Phive -Phive-thriftserver -DskipTests'

Running bin/spark-sql gives the following error:
{code}
21/01/05 00:06:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.spark.sql.internal.SharedState$.loadHiveConfFile$default$3()Lscala/collection/Map;
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:136)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934)
{code}
Scala version 2.12.10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2021-01-02 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257647#comment-17257647
 ] 

Ted Yu commented on SPARK-33915:


Here is sample code for capturing the column and fields in downstream 
PredicatePushDown.scala
{code}
  private val JSONCapture = "`GetJsonObject\\((.*),(.*)\\)`".r
  private def transformGetJsonObject(p: Predicate): Predicate = {
val eq = p.asInstanceOf[sources.EqualTo]
eq.attribute match {
  case JSONCapture(column,field) =>
val colName = column.toString.split("#")(0)
val names = field.toString.split("\\.").foldLeft(List[String]()){(z, n) 
=> z :+ "->'"+n+"'" }
sources.EqualTo(colName + names.slice(1, names.size).mkString(""), 
eq.value).asInstanceOf[Predicate]
  case _ => sources.EqualTo("foo", "bar").asInstanceOf[Predicate]
}
  }
{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>      Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-33915) Allow json expression to be pushable column

2021-01-02 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-33915:
---
Comment: was deleted

(was: Opened https://github.com/apache/spark/pull/30984)

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33953) readImages test fails due to NoClassDefFoundError for ImageTypeSpecifier

2020-12-31 Thread Ted Yu (Jira)

Ted Yu created SPARK-33953:
--

 Summary: readImages test fails due to NoClassDefFoundError for 
ImageTypeSpecifier
 Key: SPARK-33953
 URL: https://issues.apache.org/jira/browse/SPARK-33953
 Project: Spark
  Issue Type: Test
  Components: ML
Affects Versions: 3.0.1
Reporter: Ted Yu


>From https://github.com/apache/spark/pull/30984/checks?check_run_id=1630709203 
>:
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 21.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 21.0 (TID 20) (fv-az212-589.internal.cloudapp.net executor driver): 
java.lang.NoClassDefFoundError: Could not initialize class 
javax.imageio.ImageTypeSpecifier
[info]  at 
com.sun.imageio.plugins.png.PNGImageReader.getImageTypes(PNGImageReader.java:1531)
[info]  at 
com.sun.imageio.plugins.png.PNGImageReader.readImage(PNGImageReader.java:1318)
```
It seems some dependency is missing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257025#comment-17257025
 ] 

Ted Yu commented on SPARK-33915:


Opened https://github.com/apache/spark/pull/30984

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255389#comment-17255389
 ] 

Ted Yu edited comment on SPARK-33915 at 12/31/20, 3:16 PM:
---

Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.code) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [[phone->'code' = ?, 1200]]
 - Requested Columns: [id,address,phone]

{code}


was (Author: yuzhih...@gmail.com):
Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [[phone->'phone' = ?, 1200]]
 - Requested Columns: [id,address,phone]

{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

2020-12-29 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255389#comment-17255389
 ] 

Ted Yu edited comment on SPARK-33915 at 12/29/20, 6:51 PM:
---

Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [[phone->'phone' = ?, 1200]]
 - Requested Columns: [id,address,phone]

{code}


was (Author: yuzhih...@gmail.com):
Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [["`GetJsonObject(phone#37,$.phone)`" = ?, 1200]]
 - Requested Columns: [id,address,phone]
{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-28 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255631#comment-17255631
 ] 

Ted Yu commented on SPARK-33915:


[~codingcat] [~XuanYuan][~viirya][~Alex Herman]
Can you provide your comment ?

Thanks

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-27 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255389#comment-17255389
 ] 

Ted Yu commented on SPARK-33915:


Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [["`GetJsonObject(phone#37,$.phone)`" = ?, 1200]]
 - Requested Columns: [id,address,phone]
{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254944#comment-17254944
 ] 

Ted Yu edited comment on SPARK-33915 at 12/26/20, 4:14 AM:
---

I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):
{code}
Post-Scan Filters: (get_json_object(phone#37, $.code) = 1200)
{code}
Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index e4f001d61a..1cdc2642ba 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -723,6 +725,12 @@ abstract class PushableColumnBase {
 }
   case s: GetStructField if nestedPredicatePushdownEnabled =>
 helper(s.child).map(_ :+ s.childSchema(s.ordinal).name)
+  case GetJsonObject(col, field) =>
+Some(Seq("GetJsonObject(" + col + "," + field + ")"))
   case _ => None
 }
 helper(e).map(_.quoted)
{code}


was (Author: yuzhih...@gmail.com):
I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):
{code}
Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200)
{code}
Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/dat

[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254944#comment-17254944
 ] 

Ted Yu edited comment on SPARK-33915 at 12/26/20, 4:14 AM:
---

I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):
{code}
Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200)
{code}
Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index e4f001d61a..1cdc2642ba 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -723,6 +725,12 @@ abstract class PushableColumnBase {
 }
   case s: GetStructField if nestedPredicatePushdownEnabled =>
 helper(s.child).map(_ :+ s.childSchema(s.ordinal).name)
+  case GetJsonObject(col, field) =>
+Some(Seq("GetJsonObject(" + col + "," + field + ")"))
   case _ => None
 }
 helper(e).map(_.quoted)
{code}


was (Author: yuzhih...@gmail.com):
I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):

Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200)

Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/Da

[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254944#comment-17254944
 ] 

Ted Yu commented on SPARK-33915:


I have experimented with two patches which allow json expression to be pushable 
column (without the presence of cast).

The first involves duplicating GetJsonObject as GetJsonString where eval() 
method returns UTF8String. FunctionRegistry.scala and functions.scala would 
have corresponding change.
In PushableColumnBase class, new case for GetJsonString is added.

Since code duplication is undesirable and case class cannot be directly 
subclassed, there is more work to be done in this direction.

The second is simpler. The return type of GetJsonObject#eval is changed to 
UTF8String.
I have run through catalyst / sql core unit tests which passed.
In PushableColumnBase class, new case for GetJsonObject is added.

In test output, I observed the following (for second patch, output for first 
patch is similar):

Post-Scan Filters: (get_json_object(phone#37, $.phone) = 1200)

Comment on the two approaches (and other approach) is welcome.

Below is snippet for second approach.
{code}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index c22b68890a..8004ddd735 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -139,7 +139,7 @@ case class GetJsonObject(json: Expression, path: Expression)

   @transient private lazy val parsedPath = 
parsePath(path.eval().asInstanceOf[UTF8String])

-  override def eval(input: InternalRow): Any = {
+  override def eval(input: InternalRow): UTF8String = {
 val jsonStr = json.eval(input).asInstanceOf[UTF8String]
 if (jsonStr == null) {
   return null
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index e4f001d61a..1cdc2642ba 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -723,6 +725,12 @@ abstract class PushableColumnBase {
 }
   case s: GetStructField if nestedPredicatePushdownEnabled =>
 helper(s.child).map(_ :+ s.childSchema(s.ordinal).name)
+  case GetJsonObject(col, field) =>
+Some(Seq("GetJsonObject(" + col + "," + field + ")"))
   case _ => None
 }
 helper(e).map(_.quoted)
{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-33915:
---
Description: 
Currently PushableColumnBase provides no support for json / jsonb expression.

Example of json expression:
{code}
get_json_object(phone, '$.code') = '1200'
{code}
If non-string literal is part of the expression, the presence of cast() would 
complicate the situation.

Implication is that implementation of SupportsPushDownFilters doesn't have a 
chance to perform pushdown even if third party DB engine supports json 
expression pushdown.

This issue is for discussion and implementation of Spark core changes which 
would allow json expression to be recognized as pushable column.

  was:
Currently PushableColumnBase provides no support for json / jsonb expression.

Example of json expression:
get_json_object(phone, '$.code') = '1200'

If non-string literal is part of the expression, the presence of cast() would 
complicate the situation.

Implication is that implementation of SupportsPushDownFilters doesn't have a 
chance to perform pushdown even if third party DB engine supports json 
expression pushdown.

This issue is for discussion and implementation of Spark core changes which 
would allow json expression to be recognized as pushable column.


> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33915) Allow json expression to be pushable column

2020-12-25 Thread Ted Yu (Jira)

Ted Yu created SPARK-33915:
--

 Summary: Allow json expression to be pushable column
 Key: SPARK-33915
 URL: https://issues.apache.org/jira/browse/SPARK-33915
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Ted Yu


Currently PushableColumnBase provides no support for json / jsonb expression.

Example of json expression:
get_json_object(phone, '$.code') = '1200'

If non-string literal is part of the expression, the presence of cast() would 
complicate the situation.

Implication is that implementation of SupportsPushDownFilters doesn't have a 
chance to perform pushdown even if third party DB engine supports json 
expression pushdown.

This issue is for discussion and implementation of Spark core changes which 
would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2020-07-27 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166082#comment-17166082
 ] 

Ted Yu commented on HBASE-20226:


{code}
+if (v1Regions.size() > 0 || v2Regions.size() > 0) {
{code}
It seems the thread pool is needed when v1Regions.size()+v2Regions.size() > 1.

There are also a few findbugs warnings to be addressed.

> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Priority: Minor
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24137) The max merge count of metafixer should be configurable in MetaFixer

2020-04-08 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-24137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078854#comment-17078854
 ] 

Ted Yu commented on HBASE-24137:


Please find people who have touched this class to review.
I may not have time to review.

> The max merge count of metafixer should be configurable in MetaFixer
> 
>
> Key: HBASE-24137
> URL: https://issues.apache.org/jira/browse/HBASE-24137
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Yu Wang
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: 24137_master_1.patch
>
>
> The max merge count of metafixer should be configurable in MetaFixer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 42210 matches

Mail list logo