Re: Build error: package does not exist

2019-10-28 Thread Pritam Sadhukhan
Hi Hynek,

I used `mvn clean install -DskipTests -Drat.skip=true` on the root
directory.
I have build release 1.9 branch and took me around 1hr 23 minutes to
complete with no errors.

Regards,
Pritam.

On Tue, 29 Oct 2019 at 01:02, Hynek Noll  wrote:

> And I've tried just
> `mvn clean install -DskipTests -Drat.skip=true -DskipITs`
> as well. It takes around half an hour, so I'm not too keen to try all the
> possibilities.
>
> I guess there might be some other SDKs/libraries that I'm missing and Maven
> won't tell me? Or just some random incompatibility?
>
> Thanks for any tips,
> Hynek
>
> po 28. 10. 2019 v 20:28 odesílatel Hynek Noll  napsal:
>
> > And I've tried just
> > `mvn clean install -DskipTests -Drat.skip=true -DskipITs`
> > as well. It takes around half an hour, so I'm not too keen to try all the
> > possibilities.
> >
> > I guess there might be some other SDKs/libraries that I'm missing and
> > Maven won't tell me? Or just some random incompatibility?
> >
> > Thanks for any tips,
> > Hynek
> >
> > po 28. 10. 2019 v 20:24 odesílatel Hynek Noll 
> > napsal:
> >
> >> Dear Pritam,
> >> I've tried that as well, specifically I ran:
> >> `mvn clean install -DskipTests -Drat.skip=true -DskipITs
> >> -Pinclude-kinesis -Daws.kinesis-kpl.version=0.12.6`
> >> But the result is still the same. During the build, the packages that I
> >> suppose should be generated by Maven based on Amazon Kinesis are
> missing.
> >>
> >> Best regards,
> >> Hynek
> >>
> >> po 28. 10. 2019 v 19:57 odesílatel Pritam Sadhukhan <
> >> sadhukhan.pri...@gmail.com> napsal:
> >>
> >>> Hi Hynek,
> >>>
> >>> please run mvn clean install -DskipTests -Drat.skip=true.
> >>>
> >>> It should build properly but takes time.
> >>>
> >>> Regards
> >>>
> >>> On Mon, Oct 28, 2019, 10:06 PM Hynek Noll 
> wrote:
> >>>
> >>> > Hi Bruce and Jark,
> >>> > Thank you for the tip, but I already did the similar by clicking
> >>> "Generate
> >>> > Sources and Update Folders". I tried the suggested command(s), but
> >>> without
> >>> > success unfortunately.
> >>> > Executing  `mvn clean install -DskipTests` resulted in an error: "Too
> >>> many
> >>> > files with unapproved license: 2 See RAT report ...". (In the report
> it
> >>> > states the two files are:
> >>> > flink-core/src/test/resources/abstractID-with-toString-field
> >>> > flink-core/src/test/resources/abstractID-with-toString-field-set
> >>> > ) While `mvn clean package -DskipTests` actually runs for 30+ minutes
> >>> (much
> >>> > longer that the first command but maybe that stops early because of
> the
> >>> > error) and finishes fine but I have the same problems afterwards.
> >>> >
> >>> > I've tried switching to Maven 3.1.1 (from 3.6.1).
> >>> >
> >>> > Now the one thing that resolved the above stated missing package was
> >>> > switching to Scala 2.11.12 SDK (instead of 2.12)!
> >>> >
> >>> > The steps I've been taking were (within IntelliJ) Invalidate caches &
> >>> > restart (alternatively Exit and delete the .idea folder) -> Open
> >>> IntelliJ
> >>> > again -> Maven Reimport -> Maven Generate Sources and Update Folders
> ->
> >>> > Build Project. That results in further package(s) missing:
> >>> >
> >>> >
> >>> > *Error:(21, 53) java: package
> >>> org.apache.flink.kinesis.shaded.com.amazonaws
> >>> > does not exist*
> >>> > Maybe it now has to do with just the dependency shading?
> >>> >
> >>> > Best regards,
> >>> > Hynek
> >>> >
> >>> > ne 27. 10. 2019 v 15:02 odesílatel Jark Wu 
> napsal:
> >>> >
> >>> > > Hi Hynek,
> >>> > >
> >>> > > Bruce is right, you should build Flink source code first before
> >>> > developing
> >>> > > by `mvn clean package -DskipTests` in the root directory of Flink.
> >>> > > This may take 10 minutes or more depends on your machine.
> >>> > >
> >>> > > Best,
> >>> > > Jark
> >>> > >
> >>> > > On Sun, 27 Oct 2019 at 20:46, yanjun qiu 
> >>> wrote:
> >>> > >
> >>> > > > Hi Hynek,
> >>> > > > I think you should run maven build first, execute mvn clean
> install
> >>> > > > -DskipTests. Because the Flink SQL parser is used apache calcite
> >>> > > framework
> >>> > > > to generate the sql parser source code.
> >>> > > >
> >>> > > > Regards,
> >>> > > > Bruce
> >>> > > >
> >>> > > > > 在 2019年10月27日,上午12:09,Hynek Noll  写道:
> >>> > > > >
> >>> > > > > package seems to be missing on GitHub:
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
>


Re: Build error: package does not exist

2019-10-28 Thread Pritam Sadhukhan
Hi Hynek,

please run mvn clean install -DskipTests -Drat.skip=true.

It should build properly but takes time.

Regards

On Mon, Oct 28, 2019, 10:06 PM Hynek Noll  wrote:

> Hi Bruce and Jark,
> Thank you for the tip, but I already did the similar by clicking "Generate
> Sources and Update Folders". I tried the suggested command(s), but without
> success unfortunately.
> Executing  `mvn clean install -DskipTests` resulted in an error: "Too many
> files with unapproved license: 2 See RAT report ...". (In the report it
> states the two files are:
> flink-core/src/test/resources/abstractID-with-toString-field
> flink-core/src/test/resources/abstractID-with-toString-field-set
> ) While `mvn clean package -DskipTests` actually runs for 30+ minutes (much
> longer that the first command but maybe that stops early because of the
> error) and finishes fine but I have the same problems afterwards.
>
> I've tried switching to Maven 3.1.1 (from 3.6.1).
>
> Now the one thing that resolved the above stated missing package was
> switching to Scala 2.11.12 SDK (instead of 2.12)!
>
> The steps I've been taking were (within IntelliJ) Invalidate caches &
> restart (alternatively Exit and delete the .idea folder) -> Open IntelliJ
> again -> Maven Reimport -> Maven Generate Sources and Update Folders ->
> Build Project. That results in further package(s) missing:
>
>
> *Error:(21, 53) java: package org.apache.flink.kinesis.shaded.com.amazonaws
> does not exist*
> Maybe it now has to do with just the dependency shading?
>
> Best regards,
> Hynek
>
> ne 27. 10. 2019 v 15:02 odesílatel Jark Wu  napsal:
>
> > Hi Hynek,
> >
> > Bruce is right, you should build Flink source code first before
> developing
> > by `mvn clean package -DskipTests` in the root directory of Flink.
> > This may take 10 minutes or more depends on your machine.
> >
> > Best,
> > Jark
> >
> > On Sun, 27 Oct 2019 at 20:46, yanjun qiu  wrote:
> >
> > > Hi Hynek,
> > > I think you should run maven build first, execute mvn clean install
> > > -DskipTests. Because the Flink SQL parser is used apache calcite
> > framework
> > > to generate the sql parser source code.
> > >
> > > Regards,
> > > Bruce
> > >
> > > > 在 2019年10月27日,上午12:09,Hynek Noll  写道:
> > > >
> > > > package seems to be missing on GitHub:
> > >
> > >
> >
>


Re: Need help on orcsourcetable with hdfs

2019-10-15 Thread Pritam Sadhukhan
Can anyone please help me with the conf files?
Am I missing anything on the configuration part?

Regards,
Pritam.

On Tue, 15 Oct 2019 at 08:48, Pritam Sadhukhan 
wrote:

> Thanks for the information.
>
> I am able to see all the files using hdfs shell command.
> Even I am able to pull the data on flink with
>
> environment.readTextFile("hdfs://host:port/qlake/logs/sa_structured_events")
>
> The issue is only with orcdatasource implementation.
> Here is my configuration files.
>
> *flink-conf.yaml:*
>
> 
> #  Licensed to the Apache Software Foundation (ASF) under one
> #  or more contributor license agreements.  See the NOTICE file
> #  distributed with this work for additional information
> #  regarding copyright ownership.  The ASF licenses this file
> #  to you under the Apache License, Version 2.0 (the
> #  "License"); you may not use this file except in compliance
> #  with the License.  You may obtain a copy of the License at
> #
> #  http://www.apache.org/licenses/LICENSE-2.0
> #
> #  Unless required by applicable law or agreed to in writing, software
> #  distributed under the License is distributed on an "AS IS" BASIS,
> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> #  See the License for the specific language governing permissions and
> # limitations under the License.
>
> 
>
>
>
> #==
> # Common
>
> #==
>
> # The external address of the host on which the JobManager runs and can be
>
> # reached by the TaskManagers and any clients which want to connect. This 
> setting
>
> # is only used in Standalone mode and may be overwritten on the JobManager 
> side
>
> # by specifying the --host  parameter of the bin/jobmanager.sh 
> executable.
>
> # In high availability mode, if you use the bin/start-cluster.sh script and 
> setup
>
> # the conf/masters file, this will be taken care of automatically. Yarn/Mesos
>
> # automatically configure the host name based on the hostname of the node 
> where the
> # JobManager runs.
>
> jobmanager.rpc.address: localhost
>
> # The RPC port where the JobManager is reachable.
>
> jobmanager.rpc.port: 6123
>
>
> # The heap size for the JobManager JVM
>
> jobmanager.heap.size: 1024m
>
>
> # The heap size for the TaskManager JVM
>
> taskmanager.heap.size: 1024m
>
>
>
> # The number of task slots that each TaskManager offers. Each slot runs one 
> parallel pipeline.
>
> taskmanager.numberOfTaskSlots: 8
>
>
> # The parallelism used for programs that did not specify and other 
> parallelism.
>
> parallelism.default: 1
>
> # The default file system scheme and authority.
> #
>
> # By default file paths without scheme are interpreted relative to the local
>
> # root file system 'file:///'. Use this to override the default and interpret
> # relative paths relative to a different file system,
> # for example 'hdfs://mynamenode:12345'
> #
> # fs.default-scheme
>
>
> #==
> # High Availability
>
> #==
>
> # The high-availability mode. Possible options are 'NONE' or 'zookeeper'.
> #
> # high-availability: zookeeper
>
>
> # The path where metadata for master recovery is persisted. While ZooKeeper 
> stores
>
> # the small ground truth for checkpoint and leader election, this location 
> stores
> # the larger objects, like persisted dataflow graphs.
> #
> # Must be a durable file system that is accessible from all nodes
> # (like HDFS, S3, Ceph, nfs, ...)
> #
> # high-availability.storageDir: hdfs:///flink/ha/
>
> # The list of ZooKeeper quorum peers that coordinate the high-availability
> # setup. This must be a list of the form:
> # "host1:clientPort,host2:clientPort,..." (default clientPort: 2181)
> #
> # high-availability.zookeeper.quorum: localhost:2181
>
>
> # ACL options are based on
> https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes
>
> # It can be either "creator" (ZOO_CREATE_ALL_ACL) or "open" 
> (ZOO_OPEN_ACL_UNSAFE)
>
> # The default value is "open" and it can be changed to "creator" if ZK 
> security is enabled
> #
> # high-availability.zookeeper.client.acl: o

Re: Need help on orcsourcetable with hdfs

2019-10-14 Thread Pritam Sadhukhan
s.
#historyserver.archive.fs.dir: hdfs:///completed-jobs/

# Interval in milliseconds for refreshing the monitored directories.
#historyserver.archive.fs.refresh-interval: 1

akka.ask.timeout: 1000 s
akka.client.timeout: 1000 s
akka.lookup.timeout: 1000 s

web.timeout: 100
taskmanager.debug.memory.log: true

*hdfs-site.xml:*














On Tue, 15 Oct 2019 at 08:38, 刘芃成  wrote:

> Maybe you can paste your flink configuration and hdfs-site.xml and check
> if there are some problems on the hdfs fileSystem related conf. Also you
> should check whether this path really exists on hdfs with a hdfs shell
> command(e.g. hdfs dfs -ls /xxx, see
> https://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/FileSystemShell.html
> )
> At 2019-10-15 01:27:39, "Pritam Sadhukhan" 
> wrote:
> >Hi,
> >
> >I am trying to use orcsourcetable to fetch data stored in hive tables on
> >hdfs.
> >I am able to use the orcsourcetable to fetch the data and deserialize on
> >local cluster.
> >
> >But when I am trying to use the hdfs path, it is throwing me file not
> found
> >error.
> >
> >Any help will be appreciated on the topic.
> >
> >Versions:
> >
> >Flink: 1.7.1
> >Hive: 2.3.4
> >
> >*Code snippet:*
> >
> >import org.apache.flink.api.java.DataSet;
> >import org.apache.flink.api.java.ExecutionEnvironment;
> >import org.apache.flink.configuration.Configuration;
> >import org.apache.flink.core.fs.FileSystem;
> >import org.apache.flink.orc.OrcTableSource;
> >import org.apache.flink.table.api.java.BatchTableEnvironment;
> >import org.apache.flink.table.api.Table;
> >import org.apache.flink.table.api.TableEnvironment;
> >import org.apache.flink.types.Row;
> >
> >final ExecutionEnvironment environment = ExecutionEnvironment
> >.getExecutionEnvironment();
> >BatchTableEnvironment tableEnvironment =
> >TableEnvironment.getTableEnvironment(environment);
> >OrcTableSource orcTS = OrcTableSource.builder()
> >.path("hdfs://host:port/logs/sa_structured_events")
> >.forOrcSchema(new
> >OrcSchemaProvider().getStructuredEventsSchema())
> >.build();
> >
> >tableEnvironment.registerTableSource("OrcTable", orcTS);
> >Table result = tableEnvironment.sqlQuery("SELECT * FROM OrcTable");
> >
> >DataSet rowDataSet = tableEnvironment.toDataSet(result, Row.class);
> >
> >tableEnvironment.execEnv().execute();
> >
> >
> >*Error:*
> >2019-10-14 16:56:26,048 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph-
> DataSource
> >(OrcFile[path=hdfs://host:port/logs/sa_structured_events,
> >schema=struct >(9e1ad40a0f0b80ef0ad8d3b2fc58816d) switched from RUNNING to FAILED.
> >java.io.FileNotFoundException: File
>
> >/logs/sa_structured_events/part-0-b2562d39-1097-490c-99dd-672ed12bbb10-c000.snappy.orc
> >does not exist
> >at
>
> >org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
> >at
>
> >org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
> >at
>
> >org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
> >at
>
> >org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> >at
>
> >org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> >at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> >at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
> >at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:517)
> >at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:364)
> >at org.apache.orc.OrcFile.createReader(OrcFile.java:251)
> >at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225)
> >at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63)
> >at
>
> >org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170)
> >at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> >at java.lang.Thread.run(Unknown Source)
> >2019-10-14 16:56:26,048 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Flink
> >Java Job at Mon Oct 14 16:56:07 IST 2019
> (26a54fbcbd46cd0c4796e7308a2ba3b0)
> >switched from state RUNNING to FAILING.
> >java.io.FileNotFoundException: File
>
> >/logs/sa_structured_events/part-0-b2562d39-1097-490c-99dd-672ed12bbb10-c000.snappy.orc
> >does not exist
> >at

Need help on orcsourcetable with hdfs

2019-10-14 Thread Pritam Sadhukhan
Hi,

I am trying to use orcsourcetable to fetch data stored in hive tables on
hdfs.
I am able to use the orcsourcetable to fetch the data and deserialize on
local cluster.

But when I am trying to use the hdfs path, it is throwing me file not found
error.

Any help will be appreciated on the topic.

Versions:

Flink: 1.7.1
Hive: 2.3.4

*Code snippet:*

import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.orc.OrcTableSource;
import org.apache.flink.table.api.java.BatchTableEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.types.Row;

final ExecutionEnvironment environment = ExecutionEnvironment
.getExecutionEnvironment();
BatchTableEnvironment tableEnvironment =
TableEnvironment.getTableEnvironment(environment);
OrcTableSource orcTS = OrcTableSource.builder()
.path("hdfs://host:port/logs/sa_structured_events")
.forOrcSchema(new
OrcSchemaProvider().getStructuredEventsSchema())
.build();

tableEnvironment.registerTableSource("OrcTable", orcTS);
Table result = tableEnvironment.sqlQuery("SELECT * FROM OrcTable");

DataSet rowDataSet = tableEnvironment.toDataSet(result, Row.class);

tableEnvironment.execEnv().execute();


*Error:*
2019-10-14 16:56:26,048 INFO
 org.apache.flink.runtime.executiongraph.ExecutionGraph- DataSource
(OrcFile[path=hdfs://host:port/logs/sa_structured_events,
schema=struct(ChecksumFileSystem.java:146)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:517)
at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:364)
at org.apache.orc.OrcFile.createReader(OrcFile.java:251)
at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225)
at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
at java.lang.Thread.run(Unknown Source)
2019-10-14 16:56:26,048 INFO
 org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Flink
Java Job at Mon Oct 14 16:56:07 IST 2019 (26a54fbcbd46cd0c4796e7308a2ba3b0)
switched from state RUNNING to FAILING.
java.io.FileNotFoundException: File
/logs/sa_structured_events/part-0-b2562d39-1097-490c-99dd-672ed12bbb10-c000.snappy.orc
does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:517)
at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:364)
at org.apache.orc.OrcFile.createReader(OrcFile.java:251)
at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225)
at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
at java.lang.Thread.run(Unknown Source)


Regards,
Pritam.