date:20200217

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-02-17 Thread GitBox

pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add 
support for ingesting multiple kafka streams in a single DeltaStreamer 
deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r380490349
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -80,16 +79,27 @@
   + "{\"name\": \"begin_lat\", \"type\": \"double\"},{\"name\": 
\"begin_lon\", \"type\": \"double\"},"
   + "{\"name\": \"end_lat\", \"type\": \"double\"},{\"name\": \"end_lon\", 
\"type\": \"double\"},"
   + "{\"name\":\"fare\",\"type\": \"double\"}]}";
+  public static String GROCERY_PURCHASE_SCHEMA = 
"{\"type\":\"record\",\"name\":\"purchaserec\",\"fields\":["
 
 Review comment:
   Ok, got the point. I will get back to you on this @vinothchandar .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] nsivabalan commented on issue #1176: [WIP] [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-02-17 Thread GitBox

nsivabalan commented on issue #1176: [WIP] [HUDI-430] Adding InlineFileSystem 
to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#issuecomment-587272207
 
 
   @vinothchandar : Have fixed the build failures. Patch is good to review. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #192

2020-02-17 Thread Apache Jenkins Server

See 


Changes:


--
[...truncated 2.29 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.2-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.5.2-SNAPSHOT
[WARNING] 'artifactId' contains an

[incubator-hudi] branch hudi_test_suite_refactor updated (fa07df4 -> a31a8f6)

2020-02-17 Thread vinoyang

This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard fa07df4  trigger rebuild
 discard bb58a5f  [MINOR] Fix compile error after rebasing the branch
 add a31a8f6  [MINOR] Fix compile error after rebasing the branch

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (fa07df4)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (a31a8f6)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 hudi-test-suite/pom.xml|  4 ++--
 .../hudi/testsuite/reader/DFSHoodieDatasetInputReader.java | 10 +-
 .../org/apache/hudi/testsuite/job/TestHoodieTestSuiteJob.java  |  8 
 .../test/java/org/apache/hudi/testsuite/utils/TestUtils.java   |  4 ++--
 packaging/hudi-test-suite-bundle/pom.xml   |  4 ++--
 5 files changed, 15 insertions(+), 15 deletions(-)

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1330: [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date type columns

2020-02-17 Thread GitBox

umehrot2 commented on a change in pull request #1330: [HUDI-607] Fix to allow 
creation/syncing of Hive tables partitioned by Date type columns
URL: https://github.com/apache/incubator-hudi/pull/1330#discussion_r380346318
 
 

 ##
 File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
 ##
 @@ -77,6 +80,11 @@ public static Object getNestedFieldVal(GenericRecord 
record, String fieldName, b
 
   // return, if last part of name
   if (i == parts.length - 1) {
+
+if (isLogicalTypeDate(valueNode, part)) {
+  return LocalDate.ofEpochDay(Long.parseLong(val.toString()));
 
 Review comment:
   From my understanding, I don't think Spark's `DateType` has a timezone. It 
should be using the Local TimeZone. Same goes for `LocalDate`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1330: [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date type columns

2020-02-17 Thread GitBox

umehrot2 commented on a change in pull request #1330: [HUDI-607] Fix to allow 
creation/syncing of Hive tables partitioned by Date type columns
URL: https://github.com/apache/incubator-hudi/pull/1330#discussion_r380339707
 
 

 ##
 File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
 ##
 @@ -77,6 +80,11 @@ public static Object getNestedFieldVal(GenericRecord 
record, String fieldName, b
 
   // return, if last part of name
   if (i == parts.length - 1) {
+
+if (isLogicalTypeDate(valueNode, part)) {
 
 Review comment:
   This method seems to me like the right central place to perform this check 
and conversion. If we check the accessors of this method, it is used by Hudi to 
retrieve the key values for its metadata fields. It might be more standard to 
have Hudi treat Date as the actual Date string instead of a Long, across all 
its keys for its internal usage. It would create a lot of confusion otherwise 
and accessors of this functions will have to take care of adding this check.
   
   Also, we would again have to re-write the same logic/loop again on client 
side to check for logical type. Because this function basically returns only 
the value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-02-17 Thread GitBox

bvaradar commented on issue #1150: [HUDI-288]: Add support for ingesting 
multiple kafka streams in a single DeltaStreamer deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#issuecomment-587113088
 
 
   @pratyakshsharma : Let me know if you need any clarification.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-02-17 Thread GitBox

bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support 
for ingesting multiple kafka streams in a single DeltaStreamer deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r380322128
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/TableConfig.java
 ##
 @@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
+import com.fasterxml.jackson.annotation.JsonProperty;
+
+import java.util.Objects;
+
+/*
+Represents object with all the topic level overrides for multi table delta 
streamer execution
+ */
+@JsonIgnoreProperties(ignoreUnknown = true)
 
 Review comment:
   In TableConfig.java, there are kafka related configs and there are configs 
like Keygeneratorclass. which are related to upstream source whereas configs 
related to hive-sync and others apply to the written target dataset.  My 
understanding is TableConfig is coming from DFSProperties. IMO, decoupling 
source and table config separately will be cleaner. Let me know if there is any 
implementation difficulties around it 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-02-17 Thread GitBox

bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support 
for ingesting multiple kafka streams in a single DeltaStreamer deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r380318911
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/util/DFSTablePropertiesConfiguration.java
 ##
 @@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.util;
+
+import org.apache.hudi.common.model.TableConfig;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Used for parsing custom files having TableConfig objects.
+ */
+public class DFSTablePropertiesConfiguration {
 
 Review comment:
   Yes


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (HUDI-616) Parquet files not getting created on DFS docker instance but on local FS in TestHoodieDeltaStreamer

2020-02-17 Thread Pratyaksh Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-616:
--
Status: In Progress  (was: Open)

> Parquet files not getting created on DFS docker instance but on local FS in 
> TestHoodieDeltaStreamer
> ---
>
> Key: HUDI-616
> URL: https://issues.apache.org/jira/browse/HUDI-616
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer, Testing
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
> Fix For: 0.5.2
>
>
> In TestHoodieDeltaStreamer, 
> PARQUET_SOURCE_ROOT gets initialised even before function annotated with 
> @BeforeClass gets called as below - 
> private static final String PARQUET_SOURCE_ROOT = dfsBasePath + 
> "/parquetFiles";
> At this point, dfsBasePath variable is null and as a result, parquet files 
> get created on local FS which need to be cleared manually after testing. This 
> needs to be rectified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-616) Parquet files not getting created on DFS docker instance but on local FS in TestHoodieDeltaStreamer

2020-02-17 Thread Pratyaksh Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-616:
--
Status: Open  (was: New)

> Parquet files not getting created on DFS docker instance but on local FS in 
> TestHoodieDeltaStreamer
> ---
>
> Key: HUDI-616
> URL: https://issues.apache.org/jira/browse/HUDI-616
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer, Testing
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
> Fix For: 0.5.2
>
>
> In TestHoodieDeltaStreamer, 
> PARQUET_SOURCE_ROOT gets initialised even before function annotated with 
> @BeforeClass gets called as below - 
> private static final String PARQUET_SOURCE_ROOT = dfsBasePath + 
> "/parquetFiles";
> At this point, dfsBasePath variable is null and as a result, parquet files 
> get created on local FS which need to be cleared manually after testing. This 
> needs to be rectified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-616) Parquet files not getting created on DFS docker instance but on local FS in TestHoodieDeltaStreamer

2020-02-17 Thread Pratyaksh Sharma (Jira)

Pratyaksh Sharma created HUDI-616:
-

 Summary: Parquet files not getting created on DFS docker instance 
but on local FS in TestHoodieDeltaStreamer
 Key: HUDI-616
 URL: https://issues.apache.org/jira/browse/HUDI-616
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: DeltaStreamer, Testing
Reporter: Pratyaksh Sharma
Assignee: Pratyaksh Sharma
 Fix For: 0.5.2


In TestHoodieDeltaStreamer, 

PARQUET_SOURCE_ROOT gets initialised even before function annotated with 
@BeforeClass gets called as below - 

private static final String PARQUET_SOURCE_ROOT = dfsBasePath + "/parquetFiles";

At this point, dfsBasePath variable is null and as a result, parquet files get 
created on local FS which need to be cleared manually after testing. This needs 
to be rectified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-614) .hoodie_partition_metadata created for non-partitioned table

2020-02-17 Thread Andrew Wong (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated HUDI-614:
-
Description: 
Original issue: [https://github.com/apache/incubator-hudi/issues/1329]

I made a non-partitioned Hudi table using Spark. I was able to query it with 
Spark & Hive, but when I tried querying it with Presto, I received the error 
{{Could not find partitionDepth in partition metafile}}.

I attempted this task using emr-5.28.0 in AWS. I tried using the built-in 
spark-shell with both Amazon's /usr/lib/hudi/hudi-spark-bundle.jar (following 
[https://aws.amazon.com/blogs/aws/new-insert-update-delete-data-on-s3-with-amazon-emr-and-apache-hudi/)]
 and the org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating jar (following 
[https://hudi.apache.org/docs/quick-start-guide.html]).

I used NonpartitionedKeyGenerator & NonPartitionedExtractor in my write 
options, according to 
[https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIuseDeltaStreamerorSparkDataSourceAPItowritetoaNon-partitionedHudidataset?].
 You can see my code in the github issue linked above.

In both cases I see the .hoodie_partition_metadata file was created in the 
table path in S3. Querying the table worked in spark-shell & hive-cli, but 
attempting to query the table in presto-cli resulted in the error, "Could not 
find partitionDepth in partition metafile".

Please look into the bug or check the documentation. If there is a problem with 
the EMR install I can contact the AWS team responsible.

cc: [~bhasudha]

  was:
Original issue: [https://github.com/apache/incubator-hudi/issues/1329]

I made a non-partitioned Hudi table using Spark. I was able to query it with 
Spark & Hive, but when I tried querying it with Presto, I received the error 
{{Could not find partitionDepth in partition metafile}}.

I attempted this task using emr-5.28.0 in AWS. I tried using the built-in 
spark-shell with both Amazon's /usr/lib/hudi/hudi-spark-bundle.jar (following 
[https://aws.amazon.com/blogs/aws/new-insert-update-delete-data-on-s3-with-amazon-emr-and-apache-hudi/)]
 and the org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating jar (following 
[https://hudi.apache.org/docs/quick-start-guide.html]).

I used NonpartitionedKeyGenerator & NonPartitionedExtractor in my write 
options, according to 
[https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIuseDeltaStreamerorSparkDataSourceAPItowritetoaNon-partitionedHudidataset?].
 You can see my code in the github issue linked above.

In both cases I see the .hoodie_partition_metadata file was created in the 
table path in S3. Querying the table worked in spark-shell & hive-cli, but 
attempting to query the table in presto-cli resulted in the error, "Could not 
find partitionDepth in partition metafile".

Please look into the bug or check the documentation. If there is a problem with 
the EMR install I can contact the AWS team responsible.


> .hoodie_partition_metadata created for non-partitioned table
> 
>
> Key: HUDI-614
> URL: https://issues.apache.org/jira/browse/HUDI-614
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.5.1
>Reporter: Andrew Wong
>Priority: Major
>
> Original issue: [https://github.com/apache/incubator-hudi/issues/1329]
> I made a non-partitioned Hudi table using Spark. I was able to query it with 
> Spark & Hive, but when I tried querying it with Presto, I received the error 
> {{Could not find partitionDepth in partition metafile}}.
> I attempted this task using emr-5.28.0 in AWS. I tried using the built-in 
> spark-shell with both Amazon's /usr/lib/hudi/hudi-spark-bundle.jar (following 
> [https://aws.amazon.com/blogs/aws/new-insert-update-delete-data-on-s3-with-amazon-emr-and-apache-hudi/)]
>  and the org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating jar 
> (following [https://hudi.apache.org/docs/quick-start-guide.html]).
> I used NonpartitionedKeyGenerator & NonPartitionedExtractor in my write 
> options, according to 
> [https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIuseDeltaStreamerorSparkDataSourceAPItowritetoaNon-partitionedHudidataset?].
>  You can see my code in the github issue linked above.
> In both cases I see the .hoodie_partition_metadata file was created in the 
> table path in S3. Querying the table worked in spark-shell & hive-cli, but 
> attempting to query the table in presto-cli resulted in the error, "Could not 
> find partitionDepth in partition metafile".
> Please look into the bug or check the documentation. If there is a problem 
> with the EMR install I can contact the AWS team responsible.
> cc: [~bhasudha]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-614) .hoodie_partition_metadata created for non-partitioned table

2020-02-17 Thread Andrew Wong (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated HUDI-614:
-
Component/s: (was: DeltaStreamer)

> .hoodie_partition_metadata created for non-partitioned table
> 
>
> Key: HUDI-614
> URL: https://issues.apache.org/jira/browse/HUDI-614
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.5.1
>Reporter: Andrew Wong
>Priority: Major
>
> Original issue: [https://github.com/apache/incubator-hudi/issues/1329]
> I made a non-partitioned Hudi table using Spark. I was able to query it with 
> Spark & Hive, but when I tried querying it with Presto, I received the error 
> {{Could not find partitionDepth in partition metafile}}.
> I attempted this task using emr-5.28.0 in AWS. I tried using the built-in 
> spark-shell with both Amazon's /usr/lib/hudi/hudi-spark-bundle.jar (following 
> [https://aws.amazon.com/blogs/aws/new-insert-update-delete-data-on-s3-with-amazon-emr-and-apache-hudi/)]
>  and the org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating jar 
> (following [https://hudi.apache.org/docs/quick-start-guide.html]).
> I used NonpartitionedKeyGenerator & NonPartitionedExtractor in my write 
> options, according to 
> [https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIuseDeltaStreamerorSparkDataSourceAPItowritetoaNon-partitionedHudidataset?].
>  You can see my code in the github issue linked above.
> In both cases I see the .hoodie_partition_metadata file was created in the 
> table path in S3. Querying the table worked in spark-shell & hive-cli, but 
> attempting to query the table in presto-cli resulted in the error, "Could not 
> find partitionDepth in partition metafile".
> Please look into the bug or check the documentation. If there is a problem 
> with the EMR install I can contact the AWS team responsible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-614) .hoodie_partition_metadata created for non-partitioned table

2020-02-17 Thread Andrew Wong (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated HUDI-614:
-
Component/s: DeltaStreamer

> .hoodie_partition_metadata created for non-partitioned table
> 
>
> Key: HUDI-614
> URL: https://issues.apache.org/jira/browse/HUDI-614
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Affects Versions: 0.5.0, 0.5.1
>Reporter: Andrew Wong
>Priority: Major
>
> Original issue: [https://github.com/apache/incubator-hudi/issues/1329]
> I made a non-partitioned Hudi table using Spark. I was able to query it with 
> Spark & Hive, but when I tried querying it with Presto, I received the error 
> {{Could not find partitionDepth in partition metafile}}.
> I attempted this task using emr-5.28.0 in AWS. I tried using the built-in 
> spark-shell with both Amazon's /usr/lib/hudi/hudi-spark-bundle.jar (following 
> [https://aws.amazon.com/blogs/aws/new-insert-update-delete-data-on-s3-with-amazon-emr-and-apache-hudi/)]
>  and the org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating jar 
> (following [https://hudi.apache.org/docs/quick-start-guide.html]).
> I used NonpartitionedKeyGenerator & NonPartitionedExtractor in my write 
> options, according to 
> [https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIuseDeltaStreamerorSparkDataSourceAPItowritetoaNon-partitionedHudidataset?].
>  You can see my code in the github issue linked above.
> In both cases I see the .hoodie_partition_metadata file was created in the 
> table path in S3. Querying the table worked in spark-shell & hive-cli, but 
> attempting to query the table in presto-cli resulted in the error, "Could not 
> find partitionDepth in partition metafile".
> Please look into the bug or check the documentation. If there is a problem 
> with the EMR install I can contact the AWS team responsible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-614) .hoodie_partition_metadata created for non-partitioned table

2020-02-17 Thread Andrew Wong (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated HUDI-614:
-
Affects Version/s: 0.5.0
   0.5.1

> .hoodie_partition_metadata created for non-partitioned table
> 
>
> Key: HUDI-614
> URL: https://issues.apache.org/jira/browse/HUDI-614
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Affects Versions: 0.5.0, 0.5.1
>Reporter: Andrew Wong
>Priority: Major
>
> Original issue: [https://github.com/apache/incubator-hudi/issues/1329]
> I made a non-partitioned Hudi table using Spark. I was able to query it with 
> Spark & Hive, but when I tried querying it with Presto, I received the error 
> {{Could not find partitionDepth in partition metafile}}.
> I attempted this task using emr-5.28.0 in AWS. I tried using the built-in 
> spark-shell with both Amazon's /usr/lib/hudi/hudi-spark-bundle.jar (following 
> [https://aws.amazon.com/blogs/aws/new-insert-update-delete-data-on-s3-with-amazon-emr-and-apache-hudi/)]
>  and the org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating jar 
> (following [https://hudi.apache.org/docs/quick-start-guide.html]).
> I used NonpartitionedKeyGenerator & NonPartitionedExtractor in my write 
> options, according to 
> [https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIuseDeltaStreamerorSparkDataSourceAPItowritetoaNon-partitionedHudidataset?].
>  You can see my code in the github issue linked above.
> In both cases I see the .hoodie_partition_metadata file was created in the 
> table path in S3. Querying the table worked in spark-shell & hive-cli, but 
> attempting to query the table in presto-cli resulted in the error, "Could not 
> find partitionDepth in partition metafile".
> Please look into the bug or check the documentation. If there is a problem 
> with the EMR install I can contact the AWS team responsible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

[GitHub] [incubator-hudi] nsivabalan commented on issue #1176: [WIP] [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #192

[incubator-hudi] branch hudi_test_suite_refactor updated (fa07df4 -> a31a8f6)

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1330: [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date type columns

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1330: [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date type columns

[GitHub] [incubator-hudi] bvaradar commented on issue #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

[jira] [Updated] (HUDI-616) Parquet files not getting created on DFS docker instance but on local FS in TestHoodieDeltaStreamer

[jira] [Updated] (HUDI-616) Parquet files not getting created on DFS docker instance but on local FS in TestHoodieDeltaStreamer

[jira] [Created] (HUDI-616) Parquet files not getting created on DFS docker instance but on local FS in TestHoodieDeltaStreamer

[jira] [Updated] (HUDI-614) .hoodie_partition_metadata created for non-partitioned table

[jira] [Updated] (HUDI-614) .hoodie_partition_metadata created for non-partitioned table

[jira] [Updated] (HUDI-614) .hoodie_partition_metadata created for non-partitioned table

[jira] [Updated] (HUDI-614) .hoodie_partition_metadata created for non-partitioned table

16 matches

Site Navigation

Mail list logo

Footer information