[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299651#comment-17299651
 ] 

ASF GitHub Bot commented on PARQUET-1992:
-

gszadovszky closed pull request #877:
URL: https://github.com/apache/parquet-mr/pull/877


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299650#comment-17299650
 ] 

ASF GitHub Bot commented on PARQUET-1992:
-

gszadovszky commented on pull request #877:
URL: https://github.com/apache/parquet-mr/pull/877#issuecomment-796827049


   Closing this one in favor of #878.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299648#comment-17299648
 ] 

ASF GitHub Bot commented on PARQUET-1992:
-

gszadovszky merged pull request #878:
URL: https://github.com/apache/parquet-mr/pull/878


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298694#comment-17298694
 ] 

ASF GitHub Bot commented on PARQUET-1992:
-

gszadovszky commented on a change in pull request #878:
URL: https://github.com/apache/parquet-mr/pull/878#discussion_r591264845



##
File path: .gitignore
##
@@ -19,3 +19,4 @@ target/
 .cache
 *~
 mvn_install.log
+parquet-hadoop/parquet-testing

Review comment:
   Since you are using `target` now this change is unnecessary.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297618#comment-17297618
 ] 

ASF GitHub Bot commented on PARQUET-1992:
-

gszadovszky commented on pull request #877:
URL: https://github.com/apache/parquet-mr/pull/877#issuecomment-792959931


   Thanks a lot for taking the time to implement two separate solutions. I've 
reviewed the other PR already. Let me postpone/cancel the reviewing of this one 
until we all agree the other one is the preferred one.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297608#comment-17297608
 ] 

ASF GitHub Bot commented on PARQUET-1992:
-

gszadovszky commented on a change in pull request #878:
URL: https://github.com/apache/parquet-mr/pull/878#discussion_r589627173



##
File path: 
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestEncryptionOptions.java
##
@@ -299,13 +309,15 @@ public void testWriteReadEncryptedParquetFiles() throws 
IOException {
 testReadEncryptedParquetFiles(rootPath, DATA);
   }
 
-  @Test
-  public void testInteropReadEncryptedParquetFiles() throws IOException {
+  public void testInteropReadEncryptedParquetFiles(ErrorCollector 
errorCollector, OkHttpClient httpClient) throws IOException {

Review comment:
   Please add some notes that this method is deliberately not annotated by 
`@Test` and used elsewhere.

##
File path: 
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/ITTestEncryptionOptions.java
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.hadoop;
+
+import org.junit.Test;
+import org.junit.Rule;
+import org.junit.rules.ErrorCollector;
+
+import okhttp3.OkHttpClient;
+
+
+import java.io.IOException;
+
+/*
+ * This file continues the testing in TestEncryptionOptions. This test goals:
+ * 4) Perform interoperability tests with other (eg parquet-cpp) writers, by 
reading

Review comment:
   The number `4` does not make too much sense here.

##
File path: .gitignore
##
@@ -19,3 +19,4 @@ target/
 .cache
 *~
 mvn_install.log
+parquet-hadoop/parquet-testing

Review comment:
   I would suggest using either a temporary directory outside of source 
tree or a place the files inside the `target` directory. `target` would have 
the benefit that it is not cleaned until explicitly invoked. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-08 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297218#comment-17297218
 ] 

Gabor Szadovszky commented on PARQUET-1992:
---

[~mayaa], 

bq. Benefit - the regular dev flow of building and running unit tests won't 
require downloading files and connectivity to github bq.
We already need to download a bunch of file from the internet (maven plugins 
and dependencies). So even the tarball does require downloading if we want to 
build/test.

bq. If so, they could be run by maven-failsafe-plugin as part of the 
integration-test/verify phase and missing the interop files would not fail "mvn 
install" but only "mvn verify" bq.
AFAIK the failsafe plugin is configured to be executed at {{mvn verify}} and as 
{{install}} depends on the phase {{verify}} it still would fail if the 
integration tests could not be executed. BTW, we already have an integration 
test: 
[FileEncodingsIT|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/encodings/FileEncodingsIT.java].

bq. 2. Should the files for interop tests be downloaded directly in the test or 
using submodules in a separate maven profile for integration-test or as part of 
an existing profile, e.g. ci-test? bq.
I think there is another option by downloading the required files directly from 
maven. I am not sure which plugin is capable of this or if it is better than 
downloading from the test by java code but it is still an option.

bq. Git submodules provides flows for handling downloaded file versions - 
specific to a commit or a branch. bq.
A github download link can contain the hash of the changeset so capable of 
handling file versions.
bq. Git submodules manages downloading files only when needed bq.
This is not true in the current situation. We are invoking the {{git submodule 
update}} in the {{initialization}} phase of maven. So we are downloading the 
whole {{parquet-testing}} repo (of a specific changeset) at least once.
bq. It is aligned with the integration tests in parquet-cpp (arrow) bq.
How parquet-cpp solves the similar issue with the tarball?
bq. The files can be used for additional interop tests of other features bq.
I agree, this was my first idea I liked in git submodules. Meanwhile, I've 
started thinking about implementing interoperability tests and now I think such 
tests could be implemented in the {{parquet-testing}} repo as they do not 
require low level access to the {{parquet-mr}} classes like unit tests do. My 
fear about the git submodules is that the {{parquet-testing}} repo might grow 
big and AFAIK you cannot control which files/directory you would like to sync 
only the changeset.

bq. The tarball still won't contain the interop files, so the integration tests 
will fail on it. bq.
I think we should not add the parquet files into the source tarball in any way. 

bq. Anyway, both ways are acceptable, so I'll implement whatever sounds best to 
the community. bq.
I currently agree with [~sha...@uber.com] about downloading the required files. 
Meanwhile I am curious about the parquet-cpp solution.

bq. BTW, when investigating the profiles, it seems to me that there is an old 
reference to the "travis" maven profile mentioned in the .travis.yml file, 
though its new name is "ci-test".  bq.
That's a good catch! We'll fix it.

> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-05 Thread Xinli Shang (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296124#comment-17296124
 ] 

Xinli Shang commented on PARQUET-1992:
--

I think we shouldn't let it fail when developers run 'mvn package/install' or 
'mvn verify' in any case if they don't make any changes.  So I like the idea of 
downloading directly.  I will review the code once it passes the build. 

> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296064#comment-17296064
 ] 

ASF GitHub Bot commented on PARQUET-1992:
-

andersonm-ibm opened a new pull request #878:
URL: https://github.com/apache/parquet-mr/pull/878


   This PR is another option of solving the problem "Cannot build from tarball 
because of git submodules".
   1. Encryption interop tests run separately from unit tests - in 
integration-test phase 
   2. Files for interop tests are downloaded manually, so git submodules are 
totally removed from the project
   
   - `mvn package/install` - doesn't run interop tests
   - `mvn verify` - run interop test and download files for interop from GitHub 
directly



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-03 Thread Maya Anderson (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294676#comment-17294676
 ] 

Maya Anderson commented on PARQUET-1992:


[~gszadovszky] , I would say that there are 2 questions here:

1. Should the interop tests run separately from unit tests?
 * Benefit - the regular dev flow of building and running unit tests won't 
require downloading files and connectivity to github
 * If so, they could be run by {{maven-failsafe-plugin}} as part of the 
integration-test/verify phase and missing the interop files would not fail 
"{{mvn install}}" but only "{{mvn verify}}"

2. Should the files for interop tests be downloaded directly in the test or 
using submodules in a separate maven profile for integration-test or as part of 
an existing profile, e.g. {{ci-test}}?

I see the following advantages of the submodule approach:
 * Git submodules provides flows for handling downloaded file versions - 
specific to a commit or a branch.
 * Git submodules manages downloading files only when needed
 * It is aligned with the integration tests in parquet-cpp (arrow)
 * The files can be used for additional interop tests of other features

Disadvantages:
 * The tarball still won't contain the interop files, so the integration tests 
will fail on it. However, if interop tests are separate from unit tests, then 
maybe it shouldn't be a problem?

Anyway, both ways are acceptable, so I'll implement whatever sounds best to the 
community.

 

BTW, when investigating the profiles, it seems to me that there is an old 
reference to the "travis" maven profile mentioned in the .travis.yml file, 
though its new name is "ci-test". 

> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-02 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293788#comment-17293788
 ] 

Gabor Szadovszky commented on PARQUET-1992:
---

[~mayaa], the {{integration-test}} you've referenced is a goal of the 
{{maven-failsafe-plugin}} not a profile reference. Anyway it is one option to 
not to execute these tests (and the related git module update) for the default 
profiles. Another way would be to download the required parquet files in a way 
that is working if you are not in a git repository (the extracted tarball). One 
easy way is to use the direct github links of the actual files (e.g. 
https://github.com/apache/parquet-testing/raw/40379b3/data/encrypt_columns_and_footer.parquet.encrypted).
 I think, downloading the files has some benefits over updating the whole 
github submodule. But I am open to discussions. 

> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-02 Thread Maya Anderson (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293680#comment-17293680
 ] 

Maya Anderson commented on PARQUET-1992:


[~gszadovszky] and [~junjie] ,how about separating the integration tests into a 
separate integration-test profile, somewhat similarly to the one described in 
[https://www.petrikainulainen.net/programming/maven/integration-testing-with-maven/]
 ?

I see that we already have a reference to this currently non-existing profile:

https://github.com/apache/parquet-mr/blame/2c6ceb330bbfd282715730b478714b84418c0749/pom.xml#L411

> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-02 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293536#comment-17293536
 ] 

Gabor Szadovszky commented on PARQUET-1992:
---

Some more info about the tar file generation. It is generated by the script 
[dev/source-release.sh|https://github.com/apache/parquet-mr/blob/master/dev/source-release.sh#L57].
 The command {{git archive}} is used. It seems that {{git archive}} does not 
care about the git modules.
However, it is not necessarily a bad thing. Currently the whole repository of 
parquet-testing is cloned. This is not a great deal because currently it is 
136K only. But we are planning to extend that repo and also we can never know 
when will someone upload files for testing something that is unrelated to 
parquet-mr. Also, the content of parquet-testing is not something we would like 
to include in our source tarball.

As a summary we need a method for downloading the required parquet files in a 
way that is working from both the git repo (at development or from the CI) and 
from the unpacked source tarball.

> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-02 Thread Gidon Gershinsky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293509#comment-17293509
 ] 

Gidon Gershinsky commented on PARQUET-1992:
---

This contribution had been added by [~mayaa], she knows the subject better than 
me. Maya, could you address the comments and the question in this jira?

> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-01 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293475#comment-17293475
 ] 

Gabor Szadovszky commented on PARQUET-1992:
---

That's a good idea, [~junjie].

Another option would be to download the required parquet files using github 
direct links. We are downloading our dependencies during the build anyway so 
downloading some files additionally required by the testing should be 
acceptable. 

> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1992) Cannot build from tarball because of git submodules

2021-03-01 Thread Junjie Chen (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293285#comment-17293285
 ] 

Junjie Chen commented on PARQUET-1992:
--

How about make the related tests required in Travis CI and optional in the 
maven build?

> Cannot build from tarball because of git submodules
> ---
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
>  Issue Type: Bug
>Reporter: Gabor Szadovszky
>Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)