Re: [BUG] Null Pointer Exception in SourceFormatAdapter
Hi Vinoth, I would like to take it up. Will be sending the PR soon. :) On Mon, Sep 16, 2019 at 9:27 PM Vinoth Chandar wrote: > Actually went ahead and created > https://issues.apache.org/jira/browse/HUDI-253 . > Question is just about the PR for this now ? :) > > On Mon, Sep 16, 2019 at 8:54 AM Vinoth Chandar wrote: > > > +1 DeltaStreamer can be much nicer in such cases.. Any interest in > opening > > a JIRA/PR for this? > > > > On Mon, Sep 16, 2019 at 2:02 AM [email protected] > > wrote: > > > >> Yes, It makes sense to add validations with descriptive messages. > Please > >> open a ticket and send a PR for this. > >> Thanks,Balaji.VOn Monday, September 16, 2019, 01:11:12 AM PDT, > >> Pratyaksh Sharma wrote: > >> > >> Hi Balaji, > >> > >> I get your point. However I feel in such cases, instead of throwing a > Null > >> Pointer, we should handle the case gracefully. The exception should be > >> thrown with proper user-facing message. Please let me know your thoughts > >> on this. > >> > >> On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan > >> wrote: > >> > >> > Hi Pratyaksh, > >> > This is expected. You need to pass a schema-provider since you are > using > >> > Avro Sources.For RowBased sources, DeltaStreamer can deduce schema > from > >> Row > >> > type information available from Spark Dataset. > >> > Balaji.V > >> >On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma < > >> > [email protected]> wrote: > >> > > >> > Hi, > >> > > >> > I am trying to build a CDC pipeline using Hudi working on tag > >> hoodie-0.4.7. > >> > Here is the command I used for running DeltaStreamer - > >> > > >> > spark-submit --files jaas.conf --conf > >> > > >> > 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > >> > --conf > >> > > >> > > >> > 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > >> > --master yarn --deploy-mode cluster --num-executors 2 --class > >> > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer > >> > /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE > >> > --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource > >> > --source-ordering-field --target-base-path > hdfs://path/to/cow_table > >> > --target-table cow_table --props > >> hdfs://path/to/fg-kafka-source.properties > >> > --transformer-class > >> com.uber.hoodie.utilities.transform.DebeziumTransformer > >> > --spark-master yarn-cluster --source-limit 5000 > >> > > >> > Basically I have not passed any SchemaProvider class in the command. > >> When I > >> > run the above command, I get the below exception in > SourceFormatAdapter > >> and > >> > the job gets killed - > >> > > >> > java.lang.NullPointerException > >> > at > >> > > >> > > >> > com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94) > >> > at > >> > > >> > > >> > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224) > >> > at > >> > > >> > > >> > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504) > >> > > >> > In HoodieDeltaStreamer class, we try to initiate > RowBasedSchemaProvider > >> > before registering Avro Schemas if the schemaProvider variable is > null. > >> > Hence I am trying to understand if the above exception is expected > >> > behaviour. > >> > > >> > Please help. > >> > > >> > > > > >
Re: [BUG] Null Pointer Exception in SourceFormatAdapter
Actually went ahead and created https://issues.apache.org/jira/browse/HUDI-253 . Question is just about the PR for this now ? :) On Mon, Sep 16, 2019 at 8:54 AM Vinoth Chandar wrote: > +1 DeltaStreamer can be much nicer in such cases.. Any interest in opening > a JIRA/PR for this? > > On Mon, Sep 16, 2019 at 2:02 AM [email protected] > wrote: > >> Yes, It makes sense to add validations with descriptive messages. Please >> open a ticket and send a PR for this. >> Thanks,Balaji.VOn Monday, September 16, 2019, 01:11:12 AM PDT, >> Pratyaksh Sharma wrote: >> >> Hi Balaji, >> >> I get your point. However I feel in such cases, instead of throwing a Null >> Pointer, we should handle the case gracefully. The exception should be >> thrown with proper user-facing message. Please let me know your thoughts >> on this. >> >> On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan >> wrote: >> >> > Hi Pratyaksh, >> > This is expected. You need to pass a schema-provider since you are using >> > Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from >> Row >> > type information available from Spark Dataset. >> > Balaji.V >> >On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma < >> > [email protected]> wrote: >> > >> > Hi, >> > >> > I am trying to build a CDC pipeline using Hudi working on tag >> hoodie-0.4.7. >> > Here is the command I used for running DeltaStreamer - >> > >> > spark-submit --files jaas.conf --conf >> > >> 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' >> > --conf >> > >> > >> 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' >> > --master yarn --deploy-mode cluster --num-executors 2 --class >> > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer >> > /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE >> > --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource >> > --source-ordering-field --target-base-path hdfs://path/to/cow_table >> > --target-table cow_table --props >> hdfs://path/to/fg-kafka-source.properties >> > --transformer-class >> com.uber.hoodie.utilities.transform.DebeziumTransformer >> > --spark-master yarn-cluster --source-limit 5000 >> > >> > Basically I have not passed any SchemaProvider class in the command. >> When I >> > run the above command, I get the below exception in SourceFormatAdapter >> and >> > the job gets killed - >> > >> > java.lang.NullPointerException >> > at >> > >> > >> com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94) >> > at >> > >> > >> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224) >> > at >> > >> > >> com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504) >> > >> > In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider >> > before registering Avro Schemas if the schemaProvider variable is null. >> > Hence I am trying to understand if the above exception is expected >> > behaviour. >> > >> > Please help. >> > >> > >
Re: [BUG] Null Pointer Exception in SourceFormatAdapter
+1 DeltaStreamer can be much nicer in such cases.. Any interest in opening a JIRA/PR for this? On Mon, Sep 16, 2019 at 2:02 AM [email protected] wrote: > Yes, It makes sense to add validations with descriptive messages. Please > open a ticket and send a PR for this. > Thanks,Balaji.VOn Monday, September 16, 2019, 01:11:12 AM PDT, > Pratyaksh Sharma wrote: > > Hi Balaji, > > I get your point. However I feel in such cases, instead of throwing a Null > Pointer, we should handle the case gracefully. The exception should be > thrown with proper user-facing message. Please let me know your thoughts > on this. > > On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan > wrote: > > > Hi Pratyaksh, > > This is expected. You need to pass a schema-provider since you are using > > Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from > Row > > type information available from Spark Dataset. > > Balaji.V > >On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma < > > [email protected]> wrote: > > > > Hi, > > > > I am trying to build a CDC pipeline using Hudi working on tag > hoodie-0.4.7. > > Here is the command I used for running DeltaStreamer - > > > > spark-submit --files jaas.conf --conf > > > 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > > --conf > > > > > 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > > --master yarn --deploy-mode cluster --num-executors 2 --class > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer > > /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE > > --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource > > --source-ordering-field --target-base-path hdfs://path/to/cow_table > > --target-table cow_table --props > hdfs://path/to/fg-kafka-source.properties > > --transformer-class > com.uber.hoodie.utilities.transform.DebeziumTransformer > > --spark-master yarn-cluster --source-limit 5000 > > > > Basically I have not passed any SchemaProvider class in the command. > When I > > run the above command, I get the below exception in SourceFormatAdapter > and > > the job gets killed - > > > > java.lang.NullPointerException > > at > > > > > com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94) > > at > > > > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224) > > at > > > > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504) > > > > In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider > > before registering Avro Schemas if the schemaProvider variable is null. > > Hence I am trying to understand if the above exception is expected > > behaviour. > > > > Please help. > > >
Re: [BUG] Null Pointer Exception in SourceFormatAdapter
Yes, It makes sense to add validations with descriptive messages. Please open a ticket and send a PR for this. Thanks,Balaji.VOn Monday, September 16, 2019, 01:11:12 AM PDT, Pratyaksh Sharma wrote: Hi Balaji, I get your point. However I feel in such cases, instead of throwing a Null Pointer, we should handle the case gracefully. The exception should be thrown with proper user-facing message. Please let me know your thoughts on this. On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan wrote: > Hi Pratyaksh, > This is expected. You need to pass a schema-provider since you are using > Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from Row > type information available from Spark Dataset. > Balaji.V > On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma < > [email protected]> wrote: > > Hi, > > I am trying to build a CDC pipeline using Hudi working on tag hoodie-0.4.7. > Here is the command I used for running DeltaStreamer - > > spark-submit --files jaas.conf --conf > 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > --conf > > 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > --master yarn --deploy-mode cluster --num-executors 2 --class > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer > /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE > --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource > --source-ordering-field --target-base-path hdfs://path/to/cow_table > --target-table cow_table --props hdfs://path/to/fg-kafka-source.properties > --transformer-class com.uber.hoodie.utilities.transform.DebeziumTransformer > --spark-master yarn-cluster --source-limit 5000 > > Basically I have not passed any SchemaProvider class in the command. When I > run the above command, I get the below exception in SourceFormatAdapter and > the job gets killed - > > java.lang.NullPointerException > at > > com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94) > at > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224) > at > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504) > > In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider > before registering Avro Schemas if the schemaProvider variable is null. > Hence I am trying to understand if the above exception is expected > behaviour. > > Please help. >
Re: [BUG] Null Pointer Exception in SourceFormatAdapter
Hi Balaji, I get your point. However I feel in such cases, instead of throwing a Null Pointer, we should handle the case gracefully. The exception should be thrown with proper user-facing message. Please let me know your thoughts on this. On Fri, Sep 13, 2019 at 7:26 PM Balaji Varadarajan wrote: > Hi Pratyaksh, > This is expected. You need to pass a schema-provider since you are using > Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from Row > type information available from Spark Dataset. > Balaji.V > On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma < > [email protected]> wrote: > > Hi, > > I am trying to build a CDC pipeline using Hudi working on tag hoodie-0.4.7. > Here is the command I used for running DeltaStreamer - > > spark-submit --files jaas.conf --conf > 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > --conf > > 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' > --master yarn --deploy-mode cluster --num-executors 2 --class > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer > /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE > --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource > --source-ordering-field --target-base-path hdfs://path/to/cow_table > --target-table cow_table --props hdfs://path/to/fg-kafka-source.properties > --transformer-class com.uber.hoodie.utilities.transform.DebeziumTransformer > --spark-master yarn-cluster --source-limit 5000 > > Basically I have not passed any SchemaProvider class in the command. When I > run the above command, I get the below exception in SourceFormatAdapter and > the job gets killed - > > java.lang.NullPointerException > at > > com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94) > at > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224) > at > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504) > > In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider > before registering Avro Schemas if the schemaProvider variable is null. > Hence I am trying to understand if the above exception is expected > behaviour. > > Please help. >
Re: [BUG] Null Pointer Exception in SourceFormatAdapter
Hi Pratyaksh, This is expected. You need to pass a schema-provider since you are using Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from Row type information available from Spark Dataset. Balaji.V On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma wrote: Hi, I am trying to build a CDC pipeline using Hudi working on tag hoodie-0.4.7. Here is the command I used for running DeltaStreamer - spark-submit --files jaas.conf --conf 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' --conf 'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' --master yarn --deploy-mode cluster --num-executors 2 --class com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer /path/to/hoodie-utilities-0.4.7.jar --storage-type COPY_ON_WRITE --source-class com.uber.hoodie.utilities.sources.AvroKafkaSource --source-ordering-field --target-base-path hdfs://path/to/cow_table --target-table cow_table --props hdfs://path/to/fg-kafka-source.properties --transformer-class com.uber.hoodie.utilities.transform.DebeziumTransformer --spark-master yarn-cluster --source-limit 5000 Basically I have not passed any SchemaProvider class in the command. When I run the above command, I get the below exception in SourceFormatAdapter and the job gets killed - java.lang.NullPointerException at com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:94) at com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:224) at com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:504) In HoodieDeltaStreamer class, we try to initiate RowBasedSchemaProvider before registering Avro Schemas if the schemaProvider variable is null. Hence I am trying to understand if the above exception is expected behaviour. Please help.
