Re: Use different S3 access key for different S3 bucket

2024-01-18 Thread Josh Mahonin via user
Oops my syntax was a bit off there, as shown in the Hadoop docs, it looks like: fs.s3a.bucket.. Josh >

Re: Use different S3 access key for different S3 bucket

2024-01-18 Thread Josh Mahonin via user
#Configuring_different_S3_buckets_with_Per-Bucket_Configuration I'm not certain if the 's3..' syntax will work, you may need to set the 'fs.s3a..' values directly. Josh On Thu, Jan 18, 2024 at 6:02 AM Qing Lim wrote: > Hi Jun > > > > I am indeed talking about processing two different tables, but I

Re: Filter push-down not working for a custom BatchTableSource

2019-05-03 Thread Josh Bradt
Hi Fabian, Thanks for taking a look. I've filed this ticket: https://issues.apache.org/jira/browse/FLINK-12399 Thanks, Josh On Fri, May 3, 2019 at 3:41 AM Fabian Hueske wrote: > Hi Josh, > > The code looks good to me. > This seems to be a bug then. > It's strange that i

Re: Filter push-down not working for a custom BatchTableSource

2019-05-02 Thread Josh Bradt
o(getReturnType()); } } Thanks, Josh On Thu, May 2, 2019 at 3:42 AM Fabian Hueske wrote: > Hi Josh, > > Does your TableSource also implement ProjectableTableSource? > If yes, you need to make sure that the filter information is also > forwarded if ProjectableTableSource.p

Filter push-down not working for a custom BatchTableSource

2019-04-30 Thread Josh Bradt
, Josh -- Josh Bradt Software Engineer 225 Franklin St, Boston, MA 02110 klaviyo.com <https://www.klaviyo.com> [image: Klaviyo Logo]

Get get file name when reading from files? Or assign timestamps from file creation time?

2018-04-06 Thread Josh Lemer
Hey there, is it possible to somehow read the filename of elements that are read from `env.readFile`? In our case, the date of creation is encoded in the file name. Otherwise, maybe it is possible to assign timestamps somehow by the file's creation time directly? Thanks!

Bucketing Sink does not complete files, when source is from a collection

2018-04-04 Thread Josh Lemer
Hello, I was wondering if I could get some pointers on what I'm doing wrong here. I posted this on stack overflow , but I thought I'd also ask here. I'm trying to generate

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-05 Thread Josh
is not a long term solution, so if anyone has any ideas I can restore the large state and investigate further. Btw sorry for going a bit off topic on this thread! On Fri, Nov 4, 2016 at 11:19 AM, Josh <jof...@gmail.com> wrote: > Hi Scott & Stephan, > > The problem has happened

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Josh
tion. You > can do this via -z flag > (https://ci.apache.org/projects/flink/flink-docs- > release-1.2/setup/cli.html). > > Does this work? > > On Fri, Nov 4, 2016 at 3:28 PM, Josh <jof...@gmail.com> wrote: > > Hi Ufuk, > > > > I see, but in my case the

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Josh
, Josh On Fri, Nov 4, 2016 at 1:52 PM, Ufuk Celebi <u...@apache.org> wrote: > No you don't need to manually trigger a savepoint. With HA checkpoints > are persisted externally and store a pointer in ZooKeeper to recover > them after a JobManager failure. > > On Fri, Nov 4, 2016

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Josh
t checkpoint? I can use `flink run -m yarn-cluster -s s3://my-savepoints/id .` to restore from a savepoint, but what if I haven't manually taken a savepoint recently? Thanks, Josh On Fri, Nov 4, 2016 at 10:06 AM, Maximilian Michels <m...@apache.org> wrote: > Hi Anchit, > >

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-04 Thread Josh
t me know if you have any more ideas! Thanks, Josh On Thu, Nov 3, 2016 at 6:27 PM, Stephan Ewen <se...@apache.org> wrote: > Is it possible that you have stalls in your topology? > > Reasons could be: > > - The data sink blocks or becomes slow for some periods (where a

Re: Using Flink with Accumulo

2016-11-03 Thread Josh Elser
Hi Oliver, Cool stuff. I wish I knew more about Flink to make some better suggestions. Some points inline, and sorry in advance if I suggest something outright wrong. Hopefully someone from the Flink side can help give context where necessary :) Oliver Swoboda wrote: Hello, I'm using

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Josh
ind anything useful in the logs so not sure what happened! Josh On Thu, Nov 3, 2016 at 12:44 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > Hi Josh, > > That warning message was added as part of FLINK-4514. It pops out whenever > a shard iterator was used after 5 mi

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Josh
,}}'}; refreshing the iterator ... Having restarted the job from my last savepoint, it's consuming the stream fine again with no problems. Do you have any idea what might be causing this, or anything I should do to investigate further? Cheers, Josh On Wed, Oct 5, 2016 at 4:55 AM, Tzu-Li (Gordon) Tai

Re: Checkpointing large RocksDB state to S3 - tips?

2016-10-25 Thread Josh
will just add more nodes whenever we need to speed up the checkpointing. Thanks, Josh On Tue, Oct 25, 2016 at 3:12 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi Josh, > Checkpoints that take longer than the checkpoint interval should not be an > issue (if you use an up-

Checkpointing large RocksDB state to S3 - tips?

2016-10-24 Thread Josh
managers? Also just wondering - is there any chance the incremental checkpoints work will be complete any time soon? Thanks, Josh

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
Ah ok great, thanks! I will try upgrading sometime this week then. Cheers, Josh On Tue, Oct 11, 2016 at 5:37 PM, Stephan Ewen <se...@apache.org> wrote: > Hi Josh! > > I think the master has gotten more stable with respect to that. The issue > you mentioned should be fixed. &

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
-fails-to-restore-RocksDB-state-after-upgrading-to-1-2-SNAPSHOT-td9110.html Sorry to jump around but do you know if that's fixed in the latest 1.2-SNAPSHOT? Was it resolved by Flink-4788? Thanks, Josh On Tue, Oct 11, 2016 at 4:13 PM, Stephan Ewen <se...@apache.org> wrote: >

Re: Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
Hi Aljoscha, Yeah I'm using S3. Is this a known problem when using S3? Do you have any ideas on how to restore my job from this state, or prevent it from happening again? Thanks, Josh On Tue, Oct 11, 2016 at 1:58 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > you

Exception when restoring state from RocksDB - how to recover?

2016-10-11 Thread Josh
checkpoint (e.g. the second-last checkpoint)? The version of Flink I'm using Flink-1.1-SNAPSHOT, from mid-June. Thanks, Josh [*]The exception when restoring state: java.lang.Exception: Could not restore checkpointed state to operators and functions

Re: Flink job fails to restore RocksDB state after upgrading to 1.2-SNAPSHOT

2016-10-03 Thread Josh
) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.flink.streaming. connectors.kafka.internals.KafkaTopicPartition Any ideas what's going on here? Is the Kafka consumer state management broken right now in Flink master? Thanks, Josh On Thu, Sep 22, 2016 at 9:28 AM

Re: Flink job fails to restore RocksDB state after upgrading to 1.2-SNAPSHOT

2016-09-21 Thread Josh
with an older version of Flink - but it would be good to know what's changed recently that's causing the restore to break and if my job is not going to be compatible with future releases of Flink. Best, Josh On Wed, Sep 21, 2016 at 8:21 PM, Stephan Ewen <se...@apache.org> wrote: >

Flink job fails to restore RocksDB state after upgrading to 1.2-SNAPSHOT

2016-09-21 Thread Josh
job in Java. Thanks, Josh

Re: Kinesis connector - Iterator expired exception

2016-08-26 Thread Josh
this happened around 5 times the job fully caught up to the head of the stream and started running smoothly again. Thanks for looking into this! Best, Josh On Fri, Aug 26, 2016 at 1:57 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > Hi Josh, > > Thank you for reporting th

Re: Reprocessing data in Flink / rebuilding Flink state

2016-08-01 Thread Josh
y-2") I guess it would be nice if Flink could recover from removed tasks/operators without needing to add dummy operators like this. Josh On Fri, Jul 29, 2016 at 5:46 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > I have to try this to verify but I think the approach wo

Re: Reprocessing data in Flink / rebuilding Flink state

2016-07-29 Thread Josh
ding state as part of the main job when it comes across events that require historical data. Actually I think we'll need to do something very similar in the future but right now I can probably get away with something simpler! Thanks for the replies! Josh On Fri, Jul 29, 2016 at 2:35 PM, Jason Brell

Re: Reprocessing data in Flink / rebuilding Flink state

2016-07-29 Thread Josh
he job first starts and there's a period of time where it's rebuilding its state from the Kafka log). Since I would have everything needed to rebuild the state persisted in a Kafka topic, I don't think I would need a second Flink job for this? Thanks, Josh On Thu, Jul 28, 2016 at 6:57 PM, Ja

Reprocessing data in Flink / rebuilding Flink state

2016-07-28 Thread Josh
of how to do this replay & switchover (rebuild state by consuming from a historical log, then switch over to processing the live stream)? Thanks for any insights, Josh

Re: State in external db (dynamodb)

2016-07-25 Thread Josh
state in the external store with a tuple: (state, previous_state, checkpoint_id). Then when reading from the store, if checkpoint_id is in the future, read the previous_state, otherwise take the current_state. On Mon, Jul 25, 2016 at 3:20 PM, Josh <jof...@gmail.com> wrote: > Hi Chen, &g

Re: State in external db (dynamodb)

2016-07-25 Thread Josh
checkpoints. Then it would implement the Checkpointed interface and write to the external store in snapshotState(...)? Thanks, Josh On Sun, Jul 24, 2016 at 6:00 PM, Chen Qin <qinnc...@gmail.com> wrote: > > > On Jul 22, 2016, at 2:54 AM, Josh <jof...@gmail.com> wrote: >

Re: State in external db (dynamodb)

2016-07-22 Thread Josh
The operator emits a new state (and updates its in-memory cache with the new state) 3. The sink batches up all the new states and upon checkpoint flushes them to the external store Could anyone point me at the work that's already been done on this? Has it already been merged into Flink? Thanks, J

Re: Dynamic partitioning for stream output

2016-07-11 Thread Josh
Hi guys, I've been working on this feature as I needed something similar. Have a look at my issue here https://issues.apache.org/jira/browse/FLINK-4190 and changes here https://github.com/joshfg/flink/tree/flink-4190 The changes follow Kostas's suggestion in this thread. Thanks, Josh On Thu

Re: How to avoid breaking states when upgrading Flink job?

2016-07-01 Thread Josh
) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290) at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) On Fri, Jul 1, 2016 at 10:21 AM, Josh <jof...@gmail.

Re: How to avoid breaking states when upgrading Flink job?

2016-07-01 Thread Josh
Thanks guys, that's very helpful info! @Aljoscha I thought I saw this exception on a job that was using the RocksDB state backend, but I'm not sure. I will do some more tests today to double check. If it's still a problem I'll try the explicit class definitions solution. Josh On Thu, Jun 30

Re: Flink on YARN - how to resize a running cluster?

2016-06-30 Thread Josh
in the future if I need to? Josh On Thu, Jun 30, 2016 at 11:17 AM, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Josh, > > at the moment it is not possible to dynamically increase the parallelism > of your job. The same holds true for a restarting a job from a savepoint. >

Flink on YARN - how to resize a running cluster?

2016-06-29 Thread Josh
(not 6!) and restored my job from the last checkpoint. Can anyone point me in the right direction? Thanks, Josh

How to avoid breaking states when upgrading Flink job?

2016-06-29 Thread Josh
for a specific class, like com.me.flink.MyJob$$anon$1$$anon$7$$anon$4 ? Thanks for any advice, Josh

Re: Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-23 Thread Josh
Hi Aljoscha, I opened an issue here https://issues.apache.org/jira/browse/FLINK-4115 and submitted a pull request. I'm not sure if my fix is the best way to resolve this, or if it's better to just remove the verification checks completely. Thanks, Josh On Thu, Jun 23, 2016 at 9:41 AM, Aljoscha

Re: Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-17 Thread Josh
the RocksDBStateBackend which is erroring with the s3 URI. I'm wondering if it could be an issue with RocksDBStateBackend? On Fri, Jun 17, 2016 at 12:09 PM, Josh <jof...@gmail.com> wrote: > Hi Gordon/Fabian, > > Thanks for helping with this! Downgrading the Maven version I was using to > build Fli

Re: Moving from single-node, Maven examples to cluster execution

2016-06-16 Thread Josh
to a remote JobManager with the -m flag. Although I don't do this at the moment because it doesn't work so easily if you're running Flink on AWS/EMR. Josh On Thu, Jun 16, 2016 at 10:51 PM, Prez Cannady <revp...@opencorrelate.org> wrote: > Having a hard time trying to get my head a

Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-16 Thread Josh
what's going on? The job I'm deploying has httpclient 4.3.6 and httpcore 4.3.3 which I believe are the libraries with the HttpConnectionParams class. Thanks, Josh

Re: Migrating from one state backend to another

2016-06-15 Thread Josh
Hi Aljoscha, Thanks, that makes sense. I will start using RocksDB right away then. Josh On Wed, Jun 15, 2016 at 1:01 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > right now migrating from one state backend to another is not possible. I > have it in the back of m

Migrating from one state backend to another

2016-06-14 Thread Josh
was just wondering if it's possible/easy to use savepoints to migrate existing state from the filesystem backend to the RocksDB backend? As I would not want to lose any job state when switching to RocksDB. If there's a way to do it then I can worry about RocksDB later. Thanks! Josh

Re: Accessing StateBackend snapshots outside of Flink

2016-06-13 Thread Josh
ouldn't work, or anything to be careful of? Thanks, Josh On Mon, Apr 18, 2016 at 8:23 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > key refers to the key extracted by your KeySelector. Right now, for every > named state (i.e. the name in the StateDescriptor) there is a an

Re: Using Flink watermarks and a large window state for scheduling

2016-06-09 Thread Josh
, and does the final transformation. I just thought there might be a nicer way to do it using Flink! On Thu, Jun 9, 2016 at 2:23 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi Josh, > I'll have to think a bit about that one. Once I have something I'll get > back to you. >

Using Flink watermarks and a large window state for scheduling

2016-06-08 Thread Josh
to execute the scheduled transformations. If anyone has any views on how this could be done, (or whether it's even possible/a good idea to do) with Flink then it would be great to hear! Thanks, Josh

Re: ClassCastException when redeploying Flink job on running cluster

2016-06-08 Thread Josh
Thanks Till, your suggestion worked! I actually just created a new SpecificData for each AvroDeserializationSchema instance, so I think it's still just as efficient. Josh On Wed, Jun 8, 2016 at 4:41 PM, Till Rohrmann <trohrm...@apache.org> wrote: > The only thing I could think of is t

Re: ClassCastException when redeploying Flink job on running cluster

2016-06-08 Thread Josh
(TimestampsAndPeriodicWatermarksOperator.java:63) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:351) ... 3 more On Wed, Jun 8, 2016 at 3:19 PM, Josh <jof...@gmail.com> wrote: > Hi Till, > > Thanks for the reply! I see - yes it does sound very much

Re: ClassCastException when redeploying Flink job on running cluster

2016-06-08 Thread Josh
].runtimeClass) Looking at SpecificData, it contains a classCache which is a map of strings to classes, similar to what's described in FLINK-1390. I'm not sure how to change my AvroDeserializationSchema to prevent this from happening though! Do you have any ideas? Josh On Wed, Jun 8, 2016 at 11:23 AM

ClassCastException when redeploying Flink job on running cluster

2016-06-08 Thread Josh
below), and I'm wondering if that's causing the problem when the Flink job creates an AvroDeserializationSchema[MyAvroType]. Does anyone have any ideas? Thanks, Josh class AvroDeserializationSchema[T <: SpecificRecordBase :ClassTag] extends DeserializationSchema[T] { ... private

Re: Combining streams with static data and using REST API as a sink

2016-05-25 Thread Josh
be a delay in receiving updates since the updates aren't being continuously ingested by Flink. But it certainly sounds like it would be a nice feature to have! Thanks, Josh On Tue, May 24, 2016 at 1:48 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi Josh, > for the first par

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Josh
handle occasional updates in that case, since I guess the open() function is only called once? Do I need to periodically restart the job, or periodically trigger tasks to restart and refresh their data? Ideally I would want this job to be running constantly. Josh On Mon, May 23, 2016 at 5:56 PM

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Josh
an HTTP REST call (updating the customer resource), rather than writing directly to a database? The reason I'd like to do it this way is to decouple the underlying database from Flink. Josh On Mon, May 23, 2016 at 2:35 PM, Al-Isawi Rami <rami.al-is...@comptel.com> wrote: > Hi Josh,

Combining streams with static data and using REST API as a sink

2016-05-23 Thread Josh
API as a Flink sink - is there a reason why this might be a bad idea? Thanks for any advice on these, Josh

Re: External DB as sink - with processing guarantees

2016-03-12 Thread Josh
Hi Fabian, Thanks, that's very helpful. Actually most of my writes will be idempotent so I guess that means I'll get the exact once guarantee using the Hadoop output format! Thanks, Josh > On 12 Mar 2016, at 09:14, Fabian Hueske <fhue...@gmail.com> wrote: > > Hi Josh,

Re: External DB as sink - with processing guarantees

2016-03-12 Thread Josh
? Thanks, Josh > On 12 Mar 2016, at 02:46, Nick Dimiduk <ndimi...@gmail.com> wrote: > > Pretty much anything you can write to from a Hadoop MapReduce program can be > a Flink destination. Just plug in the OutputFormat and go. > > Re: output semantics, your mileage may v

External DB as sink - with processing guarantees

2016-03-11 Thread Josh
to have Flink's processing guarantees? I.e. Can I be sure that every tuple has contributed to the DynamoDB state either at-least-once or exactly-once? Thanks for any advice, Josh