Re: Build error: package does not exist
Hi Hynek, I used `mvn clean install -DskipTests -Drat.skip=true` on the root directory. I have build release 1.9 branch and took me around 1hr 23 minutes to complete with no errors. Regards, Pritam. On Tue, 29 Oct 2019 at 01:02, Hynek Noll wrote: > And I've tried just > `mvn clean install -DskipTests -Drat.skip=true -DskipITs` > as well. It takes around half an hour, so I'm not too keen to try all the > possibilities. > > I guess there might be some other SDKs/libraries that I'm missing and Maven > won't tell me? Or just some random incompatibility? > > Thanks for any tips, > Hynek > > po 28. 10. 2019 v 20:28 odesílatel Hynek Noll napsal: > > > And I've tried just > > `mvn clean install -DskipTests -Drat.skip=true -DskipITs` > > as well. It takes around half an hour, so I'm not too keen to try all the > > possibilities. > > > > I guess there might be some other SDKs/libraries that I'm missing and > > Maven won't tell me? Or just some random incompatibility? > > > > Thanks for any tips, > > Hynek > > > > po 28. 10. 2019 v 20:24 odesílatel Hynek Noll > > napsal: > > > >> Dear Pritam, > >> I've tried that as well, specifically I ran: > >> `mvn clean install -DskipTests -Drat.skip=true -DskipITs > >> -Pinclude-kinesis -Daws.kinesis-kpl.version=0.12.6` > >> But the result is still the same. During the build, the packages that I > >> suppose should be generated by Maven based on Amazon Kinesis are > missing. > >> > >> Best regards, > >> Hynek > >> > >> po 28. 10. 2019 v 19:57 odesílatel Pritam Sadhukhan < > >> sadhukhan.pri...@gmail.com> napsal: > >> > >>> Hi Hynek, > >>> > >>> please run mvn clean install -DskipTests -Drat.skip=true. > >>> > >>> It should build properly but takes time. > >>> > >>> Regards > >>> > >>> On Mon, Oct 28, 2019, 10:06 PM Hynek Noll > wrote: > >>> > >>> > Hi Bruce and Jark, > >>> > Thank you for the tip, but I already did the similar by clicking > >>> "Generate > >>> > Sources and Update Folders". I tried the suggested command(s), but > >>> without > >>> > success unfortunately. > >>> > Executing `mvn clean install -DskipTests` resulted in an error: "Too > >>> many > >>> > files with unapproved license: 2 See RAT report ...". (In the report > it > >>> > states the two files are: > >>> > flink-core/src/test/resources/abstractID-with-toString-field > >>> > flink-core/src/test/resources/abstractID-with-toString-field-set > >>> > ) While `mvn clean package -DskipTests` actually runs for 30+ minutes > >>> (much > >>> > longer that the first command but maybe that stops early because of > the > >>> > error) and finishes fine but I have the same problems afterwards. > >>> > > >>> > I've tried switching to Maven 3.1.1 (from 3.6.1). > >>> > > >>> > Now the one thing that resolved the above stated missing package was > >>> > switching to Scala 2.11.12 SDK (instead of 2.12)! > >>> > > >>> > The steps I've been taking were (within IntelliJ) Invalidate caches & > >>> > restart (alternatively Exit and delete the .idea folder) -> Open > >>> IntelliJ > >>> > again -> Maven Reimport -> Maven Generate Sources and Update Folders > -> > >>> > Build Project. That results in further package(s) missing: > >>> > > >>> > > >>> > *Error:(21, 53) java: package > >>> org.apache.flink.kinesis.shaded.com.amazonaws > >>> > does not exist* > >>> > Maybe it now has to do with just the dependency shading? > >>> > > >>> > Best regards, > >>> > Hynek > >>> > > >>> > ne 27. 10. 2019 v 15:02 odesílatel Jark Wu > napsal: > >>> > > >>> > > Hi Hynek, > >>> > > > >>> > > Bruce is right, you should build Flink source code first before > >>> > developing > >>> > > by `mvn clean package -DskipTests` in the root directory of Flink. > >>> > > This may take 10 minutes or more depends on your machine. > >>> > > > >>> > > Best, > >>> > > Jark > >>> > > > >>> > > On Sun, 27 Oct 2019 at 20:46, yanjun qiu > >>> wrote: > >>> > > > >>> > > > Hi Hynek, > >>> > > > I think you should run maven build first, execute mvn clean > install > >>> > > > -DskipTests. Because the Flink SQL parser is used apache calcite > >>> > > framework > >>> > > > to generate the sql parser source code. > >>> > > > > >>> > > > Regards, > >>> > > > Bruce > >>> > > > > >>> > > > > 在 2019年10月27日,上午12:09,Hynek Noll 写道: > >>> > > > > > >>> > > > > package seems to be missing on GitHub: > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > >> >
Re: Build error: package does not exist
Hi Hynek, please run mvn clean install -DskipTests -Drat.skip=true. It should build properly but takes time. Regards On Mon, Oct 28, 2019, 10:06 PM Hynek Noll wrote: > Hi Bruce and Jark, > Thank you for the tip, but I already did the similar by clicking "Generate > Sources and Update Folders". I tried the suggested command(s), but without > success unfortunately. > Executing `mvn clean install -DskipTests` resulted in an error: "Too many > files with unapproved license: 2 See RAT report ...". (In the report it > states the two files are: > flink-core/src/test/resources/abstractID-with-toString-field > flink-core/src/test/resources/abstractID-with-toString-field-set > ) While `mvn clean package -DskipTests` actually runs for 30+ minutes (much > longer that the first command but maybe that stops early because of the > error) and finishes fine but I have the same problems afterwards. > > I've tried switching to Maven 3.1.1 (from 3.6.1). > > Now the one thing that resolved the above stated missing package was > switching to Scala 2.11.12 SDK (instead of 2.12)! > > The steps I've been taking were (within IntelliJ) Invalidate caches & > restart (alternatively Exit and delete the .idea folder) -> Open IntelliJ > again -> Maven Reimport -> Maven Generate Sources and Update Folders -> > Build Project. That results in further package(s) missing: > > > *Error:(21, 53) java: package org.apache.flink.kinesis.shaded.com.amazonaws > does not exist* > Maybe it now has to do with just the dependency shading? > > Best regards, > Hynek > > ne 27. 10. 2019 v 15:02 odesílatel Jark Wu napsal: > > > Hi Hynek, > > > > Bruce is right, you should build Flink source code first before > developing > > by `mvn clean package -DskipTests` in the root directory of Flink. > > This may take 10 minutes or more depends on your machine. > > > > Best, > > Jark > > > > On Sun, 27 Oct 2019 at 20:46, yanjun qiu wrote: > > > > > Hi Hynek, > > > I think you should run maven build first, execute mvn clean install > > > -DskipTests. Because the Flink SQL parser is used apache calcite > > framework > > > to generate the sql parser source code. > > > > > > Regards, > > > Bruce > > > > > > > 在 2019年10月27日,上午12:09,Hynek Noll 写道: > > > > > > > > package seems to be missing on GitHub: > > > > > > > > >
Re: Need help on orcsourcetable with hdfs
Can anyone please help me with the conf files? Am I missing anything on the configuration part? Regards, Pritam. On Tue, 15 Oct 2019 at 08:48, Pritam Sadhukhan wrote: > Thanks for the information. > > I am able to see all the files using hdfs shell command. > Even I am able to pull the data on flink with > > environment.readTextFile("hdfs://host:port/qlake/logs/sa_structured_events") > > The issue is only with orcdatasource implementation. > Here is my configuration files. > > *flink-conf.yaml:* > > > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # http://www.apache.org/licenses/LICENSE-2.0 > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > > > > > > #== > # Common > > #== > > # The external address of the host on which the JobManager runs and can be > > # reached by the TaskManagers and any clients which want to connect. This > setting > > # is only used in Standalone mode and may be overwritten on the JobManager > side > > # by specifying the --host parameter of the bin/jobmanager.sh > executable. > > # In high availability mode, if you use the bin/start-cluster.sh script and > setup > > # the conf/masters file, this will be taken care of automatically. Yarn/Mesos > > # automatically configure the host name based on the hostname of the node > where the > # JobManager runs. > > jobmanager.rpc.address: localhost > > # The RPC port where the JobManager is reachable. > > jobmanager.rpc.port: 6123 > > > # The heap size for the JobManager JVM > > jobmanager.heap.size: 1024m > > > # The heap size for the TaskManager JVM > > taskmanager.heap.size: 1024m > > > > # The number of task slots that each TaskManager offers. Each slot runs one > parallel pipeline. > > taskmanager.numberOfTaskSlots: 8 > > > # The parallelism used for programs that did not specify and other > parallelism. > > parallelism.default: 1 > > # The default file system scheme and authority. > # > > # By default file paths without scheme are interpreted relative to the local > > # root file system 'file:///'. Use this to override the default and interpret > # relative paths relative to a different file system, > # for example 'hdfs://mynamenode:12345' > # > # fs.default-scheme > > > #== > # High Availability > > #== > > # The high-availability mode. Possible options are 'NONE' or 'zookeeper'. > # > # high-availability: zookeeper > > > # The path where metadata for master recovery is persisted. While ZooKeeper > stores > > # the small ground truth for checkpoint and leader election, this location > stores > # the larger objects, like persisted dataflow graphs. > # > # Must be a durable file system that is accessible from all nodes > # (like HDFS, S3, Ceph, nfs, ...) > # > # high-availability.storageDir: hdfs:///flink/ha/ > > # The list of ZooKeeper quorum peers that coordinate the high-availability > # setup. This must be a list of the form: > # "host1:clientPort,host2:clientPort,..." (default clientPort: 2181) > # > # high-availability.zookeeper.quorum: localhost:2181 > > > # ACL options are based on > https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes > > # It can be either "creator" (ZOO_CREATE_ALL_ACL) or "open" > (ZOO_OPEN_ACL_UNSAFE) > > # The default value is "open" and it can be changed to "creator" if ZK > security is enabled > # > # high-availability.zookeeper.client.acl: o
Re: Need help on orcsourcetable with hdfs
s. #historyserver.archive.fs.dir: hdfs:///completed-jobs/ # Interval in milliseconds for refreshing the monitored directories. #historyserver.archive.fs.refresh-interval: 1 akka.ask.timeout: 1000 s akka.client.timeout: 1000 s akka.lookup.timeout: 1000 s web.timeout: 100 taskmanager.debug.memory.log: true *hdfs-site.xml:* On Tue, 15 Oct 2019 at 08:38, 刘芃成 wrote: > Maybe you can paste your flink configuration and hdfs-site.xml and check > if there are some problems on the hdfs fileSystem related conf. Also you > should check whether this path really exists on hdfs with a hdfs shell > command(e.g. hdfs dfs -ls /xxx, see > https://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/FileSystemShell.html > ) > At 2019-10-15 01:27:39, "Pritam Sadhukhan" > wrote: > >Hi, > > > >I am trying to use orcsourcetable to fetch data stored in hive tables on > >hdfs. > >I am able to use the orcsourcetable to fetch the data and deserialize on > >local cluster. > > > >But when I am trying to use the hdfs path, it is throwing me file not > found > >error. > > > >Any help will be appreciated on the topic. > > > >Versions: > > > >Flink: 1.7.1 > >Hive: 2.3.4 > > > >*Code snippet:* > > > >import org.apache.flink.api.java.DataSet; > >import org.apache.flink.api.java.ExecutionEnvironment; > >import org.apache.flink.configuration.Configuration; > >import org.apache.flink.core.fs.FileSystem; > >import org.apache.flink.orc.OrcTableSource; > >import org.apache.flink.table.api.java.BatchTableEnvironment; > >import org.apache.flink.table.api.Table; > >import org.apache.flink.table.api.TableEnvironment; > >import org.apache.flink.types.Row; > > > >final ExecutionEnvironment environment = ExecutionEnvironment > >.getExecutionEnvironment(); > >BatchTableEnvironment tableEnvironment = > >TableEnvironment.getTableEnvironment(environment); > >OrcTableSource orcTS = OrcTableSource.builder() > >.path("hdfs://host:port/logs/sa_structured_events") > >.forOrcSchema(new > >OrcSchemaProvider().getStructuredEventsSchema()) > >.build(); > > > >tableEnvironment.registerTableSource("OrcTable", orcTS); > >Table result = tableEnvironment.sqlQuery("SELECT * FROM OrcTable"); > > > >DataSet rowDataSet = tableEnvironment.toDataSet(result, Row.class); > > > >tableEnvironment.execEnv().execute(); > > > > > >*Error:* > >2019-10-14 16:56:26,048 INFO > > org.apache.flink.runtime.executiongraph.ExecutionGraph- > DataSource > >(OrcFile[path=hdfs://host:port/logs/sa_structured_events, > >schema=struct >(9e1ad40a0f0b80ef0ad8d3b2fc58816d) switched from RUNNING to FAILED. > >java.io.FileNotFoundException: File > > >/logs/sa_structured_events/part-0-b2562d39-1097-490c-99dd-672ed12bbb10-c000.snappy.orc > >does not exist > >at > > >org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635) > >at > > >org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861) > >at > > >org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625) > >at > > >org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > >at > > >org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > >at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > >at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787) > >at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:517) > >at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:364) > >at org.apache.orc.OrcFile.createReader(OrcFile.java:251) > >at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225) > >at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63) > >at > > >org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170) > >at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) > >at java.lang.Thread.run(Unknown Source) > >2019-10-14 16:56:26,048 INFO > > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Flink > >Java Job at Mon Oct 14 16:56:07 IST 2019 > (26a54fbcbd46cd0c4796e7308a2ba3b0) > >switched from state RUNNING to FAILING. > >java.io.FileNotFoundException: File > > >/logs/sa_structured_events/part-0-b2562d39-1097-490c-99dd-672ed12bbb10-c000.snappy.orc > >does not exist > >at
Need help on orcsourcetable with hdfs
Hi, I am trying to use orcsourcetable to fetch data stored in hive tables on hdfs. I am able to use the orcsourcetable to fetch the data and deserialize on local cluster. But when I am trying to use the hdfs path, it is throwing me file not found error. Any help will be appreciated on the topic. Versions: Flink: 1.7.1 Hive: 2.3.4 *Code snippet:* import org.apache.flink.api.java.DataSet; import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.configuration.Configuration; import org.apache.flink.core.fs.FileSystem; import org.apache.flink.orc.OrcTableSource; import org.apache.flink.table.api.java.BatchTableEnvironment; import org.apache.flink.table.api.Table; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.types.Row; final ExecutionEnvironment environment = ExecutionEnvironment .getExecutionEnvironment(); BatchTableEnvironment tableEnvironment = TableEnvironment.getTableEnvironment(environment); OrcTableSource orcTS = OrcTableSource.builder() .path("hdfs://host:port/logs/sa_structured_events") .forOrcSchema(new OrcSchemaProvider().getStructuredEventsSchema()) .build(); tableEnvironment.registerTableSource("OrcTable", orcTS); Table result = tableEnvironment.sqlQuery("SELECT * FROM OrcTable"); DataSet rowDataSet = tableEnvironment.toDataSet(result, Row.class); tableEnvironment.execEnv().execute(); *Error:* 2019-10-14 16:56:26,048 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- DataSource (OrcFile[path=hdfs://host:port/logs/sa_structured_events, schema=struct(ChecksumFileSystem.java:146) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787) at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:517) at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:364) at org.apache.orc.OrcFile.createReader(OrcFile.java:251) at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225) at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) at java.lang.Thread.run(Unknown Source) 2019-10-14 16:56:26,048 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Flink Java Job at Mon Oct 14 16:56:07 IST 2019 (26a54fbcbd46cd0c4796e7308a2ba3b0) switched from state RUNNING to FAILING. java.io.FileNotFoundException: File /logs/sa_structured_events/part-0-b2562d39-1097-490c-99dd-672ed12bbb10-c000.snappy.orc does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787) at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:517) at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:364) at org.apache.orc.OrcFile.createReader(OrcFile.java:251) at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225) at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) at java.lang.Thread.run(Unknown Source) Regards, Pritam.