Re: Kryo version and default serializer

2018-06-20 Thread Vlad Rozov

Hi Aaron,

Nice to see more contributors to the Apex! Welcome.

I'd suggest that we stop this e-mail thread and continue discussion on 
the other dev@apex only e-mail thread. Usually users@apex is not a good 
choice to discuss dependency upgrades.


Please be patient, majority of Apex and Apache committers contribute at 
their free time and there may be delays in response.


Thank you,

Vlad

On 6/19/18 20:02, Aaron Bossert wrote:

cool! I'm on it.

On Tue, Jun 19, 2018 at 11:00 PM Pramod Immaneni 
mailto:pramod.imman...@gmail.com>> wrote:


Hi Aaron,

That would be great if you can pitch in wherever you can. Let us
know how we can help. Also, I was going to try the
new kryo version later today and see if there are any internal
errors that are bubbling up to those test failures but not getting
logged.

Thanks

On Tue, Jun 19, 2018 at 7:54 PM Aaron Bossert
mailto:aa...@punchcyber.com>> wrote:

P.S.  I am happy to roll up my sleeves, but from my initial
attempt, errors
I was seeing were not obviously related to the Kryo version
update (though
I am certain they are...at least indirectly because they only
show up when
the version is changed), in terms of tracking down where the
changes would
be required...I am not really familiar with the inner workings
of Apex yet,
though this would definitely be a good excuse to learn...

I also saw a bunch of annoying warnings when building related
to use of
deprecated methods/classes within the com.datatorrent.*
namespace...I'd be
happy to start tracking some of those down too if it would be
helpful...

On Tue, Jun 19, 2018 at 10:48 PM Aaron Bossert
mailto:aa...@punchcyber.com>> wrote:

> Pramod,
>
> In a nutshell, yes.  I can set a @FieldSerializer to the
fields in
> question...the only one I have run into thus far is
Instant...but it would
> be great to not need to annotate each field this way...Also,
I could see
> this becoming problematic should I encounter more than a
couple of fields
> that are not serializable using the older version of Kryo
(but presumably
> would work with a more current one)...though admitedly, it
would not be
> that big a deal in the interim.  It would be much cleaner to
have an update
> to the most current version of Kryo, IMHO...though I
understand that is
> something of a lift.  By the way, Kryo 5 just came out
yesterday, so if
> there is to be effort expended, it might be good to go with
the latest
> version...though, heads up, I tried it out and there seem to
be quite a few
> changes that will be needed...new methods, and definitely
some replaced
> ones...
>
>
>
> On Tue, Jun 19, 2018 at 10:37 PM Pramod Immaneni <
> pramod.imman...@gmail.com
> wrote:
>
>> Hi Aaron,
>>
>> While we are debugging the test failures on dev, I didn't fully
>> understand the last question in your email. Looks like you
mentioned that
>> the workaround for adding the @FieldSerializer to the field
is working. Are
>> you looking for an alternative to this workaround and
trying to set a
>> default serializer for all fields of type Instant (the
field type you
>> mentioned in the earlier email) so that you don't have to
set the
>> annotation each time?
>>
>> Thanks
>>
>> On Tue, Jun 19, 2018 at 12:59 PM Aaron Bossert
mailto:aa...@punchcyber.com>>
>> wrote:
>>
>>> I sent an e-mail to the dev list...but have not heard any
responses.
>>> How active is that list?  Do you know? Basically, I have a
workaround that
>>> will deal with those "things" that Kryo v 2.24.0 will not
handle (but 4.0.2
>>> or 5.0.0-RC1 would) using the @FieldSerializer ...but this
is not ideal.  I
>>> have two options left: go it alone and try to update Kryo
in the apex-core,
>>> or simply change the default serializer.  The hiccup I am
running into is
>>> that I don't see how (from the documentation) to set the
default
>>> serializer...perhaps I am missing it.  Can someone point
me to where that
>>> is in the docs?
>>>
>>> On Mon, Jun 11, 2018 at 4:20 PM Aaron Bossert
mailto:aa...@punchcyber.com>>
>>> wrote:
>>>
 Ah, you make a good point about the dev list...I am using
IntelliJ on a
 Mac.  When I tried building without any changes, I also

Re: High CPU % even after increasing VCORES

2018-05-01 Thread Vlad Rozov
I am not sure I understand your assumptions regarding VCORES and CPU 
usage and/or number of tuples emitted per second. Operators and ports in 
Apex have high thread affinity, so with a single operator deployed into 
a container, it does not matter how many VCORES that container has.


Thank you,

Vlad

On 5/1/18 11:57, Vivek Bhide wrote:

Hi,

I have created a new operator which converts avro message to json message by
extending base operator. I'm unable to use AvroToPojoOperator due to the
complexity of avro structure. I'm trying to analyze its performance.

When I use:

VCORES MEMORY  Tuples Emitted MA  CPU%
12048 3100
98%
14096 3200
98%
22048 3100
98%
24096 3200
98%

*All the above Emitted, CPU% are approximates based on simple tests.

 From the above analysis, increase in number of cores or memory didn't affect
CPU or tuples emitted. My assumption was increasing VCORES will actually
reduce the CPU or may be increase emitted tuples. One of my assumption is
that, the operator is not actually using 2GB of memory as its a CPU intense
operation, so I get that increase of memory really doesn't affect tuples
emitted. But, increase in VCORES should reduce the CPU% and increase tuples
emitted as now more cores are available for processing.

Is my above assumption wrong? If so, what can be done to reduce CPU% a part
from partitioning the operator? Below is my code for custom operator.


public class CustomAvroToStringOperator extends BaseOperator {

 @NotNull
 private String schemaRegistryUrl;

 private transient KafkaAvroDeserializer avroDeserializer;

 public String getSchemaRegistryUrl() {
 return schemaRegistryUrl;
 }

 public void setSchemaRegistryUrl(String schemaRegistryUrl) {
 this.schemaRegistryUrl = schemaRegistryUrl;
 }

 public final transient DefaultOutputPort outputPort = new
DefaultOutputPort<>();

 public final transient DefaultInputPort inputPort= new
DefaultInputPort() {
 @Override
 public void process(byte[] tuple) {
 outputPort.emit(avroDeserializer.deserialize("topic",
tuple).toString());
 }
 };

 @Override
 public void setup(Context.OperatorContext context) {
 avroDeserializer = setUpSchemaRegistry();
 super.setup(context);
 }

 private KafkaAvroDeserializer setUpSchemaRegistry() {
 final Map config = new HashMap<>();
 config.put(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG,
schemaRegistryUrl);
 config.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG,
"false");
 return new KafkaAvroDeserializer(new
CachedSchemaRegistryClient(schemaRegistryUrl, 1000), config);
 }
}

Regards
Vivek



--
Sent from: http://apache-apex-users-list.78494.x6.nabble.com/




Re: TimeBasedDedupOperator failing with NullPointer

2018-03-23 Thread Vlad Rozov
IMO, it is a regression introduced in 3.8.0, not a 
configuration/infrastructure or a NULL key issue. File a JIRA.


Thank you,

Vlad

On 3/23/18 10:07, Vivek Bhide wrote:

Hi Vlad,

Yes, its 3.8.0

Regards
Vivek



--
Sent from: http://apache-apex-users-list.78494.x6.nabble.com/




Re: How to specify more than one fields as a dedup keyExpression

2017-10-23 Thread Vlad Rozov
I don't think that Apex expression evaluator is that smart :). Try 
"{$}.getId() + {$}.getId1()" or provide a getter that returns pair object.


Thank you,

Vlad

On 10/23/17 18:13, Munagala Ramanath wrote:
It needs to be an expression that combines both (or all) values: try 
"id + id1"


Ram


On Monday, October 23, 2017, 6:04:14 PM PDT, Vivek Bhide 
 wrote:



Thanks Ram for your suggestions

Field types that I am trying are the basic primitive types. In fact, I was
just playing around with the dedup examples that is available in 
malhar git.
I just added one more field id1 with getter and setters to 'TestEvent' 
class

from testcase and want to try dedup on the combination of both fields

Operator fails right during activate() method while getting the keyGetter
for which is then used in getKey()

Below are few expressions i tried
Default value id
combinations tried -
id,id1
getId(),getId1()
{$}.getId()  {$}.getId1()
"getId()","getId1()"
{$}.getId(),{$}.getId1()
{{$}.getId(),{$}.getId1()}

Below is the stacktrace of the exception I got most of the times:

2017-10-23 16:48:52,775 [2/Deduper:BoundedDedupOperator] WARN
util.LoggerUtil shouldFetchLogFileInformation - Log information is
unavailable. To enable log information log4j/logging should be configured
with single FileAppender that has immediateFlush set to true and log level
set to ERROR or greater.
2017-10-23 16:48:52,775 [2/Deduper:BoundedDedupOperator] ERROR
engine.StreamingContainer run - Abandoning deployment of operator
OperatorDeployInfo[id=2,name=Deduper,type=GENERIC,checkpoint={,
0,
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=input,streamId=Generator
to
Dedup,sourceNodeId=1,sourcePortName=output,locality=,partitionMask=0,partitionKeys=]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=unique,streamId=Dedup
Unique to Console,bufferServer=localhost],
OperatorDeployInfo.OutputDeployInfo[portName=duplicate,streamId=Dedup
Duplicate to Console,bufferServer=localhost],
OperatorDeployInfo.OutputDeployInfo[portName=expired,streamId=Dedup 
Expired

to Console,bufferServer=localhost]]] due to setup failure.
java.lang.RuntimeException: 
org.codehaus.commons.compiler.CompileException:

Line 1, Column 101: ')' expected instead of ','
    at 
com.datatorrent.lib.util.PojoUtils.compileExpression(PojoUtils.java:778)
    at 
com.datatorrent.lib.util.PojoUtils.compileExpression(PojoUtils.java:746)

    at com.datatorrent.lib.util.PojoUtils.createGetter(PojoUtils.java:603)
    at com.datatorrent.lib.util.PojoUtils.createGetter(PojoUtils.java:235)
    at com.datatorrent.lib.util.PojoUtils.createGetter(PojoUtils.java:225)
    at
org.apache.apex.malhar.lib.dedup.BoundedDedupOperator.activate(BoundedDedupOperator.java:121)
    at com.datatorrent.stram.engine.Node.activate(Node.java:644)
    at 
com.datatorrent.stram.engine.GenericNode.activate(GenericNode.java:212)

    at
com.datatorrent.stram.engine.StreamingContainer.setupNode(StreamingContainer.java:1364)
    at
com.datatorrent.stram.engine.StreamingContainer.access$100(StreamingContainer.java:129)
    at
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1413) 






--
Sent from: http://apache-apex-users-list.78494.x6.nabble.com/




Re: Apache Apex with Apache slider

2017-08-04 Thread Vlad Rozov

What will be a reason to use Apex cli over Launcher API?

Thank you,

Vlad

On 8/1/17 11:59, Thomas Weise wrote:
Slider let's you package the binaries (along with the .apa), so that 
would eliminate the need for any extra install.



On Tue, Aug 1, 2017 at 11:54 AM, Pramod Immaneni 
> wrote:


You might be able to launch the application by running the apex
cli as a separate process from your slider code and passing the
apa. The apa could be in hdfs. This would however require apex
cli to be present on all nodes as your slider code could be
running on any node on the cluster.

Thanks

On Tue, Aug 1, 2017 at 11:49 AM, Vivek Bhide
> wrote:

Yes Thomas thats exactly what we are looking for. But using
slider is a bit
trickier since slider manages the deployment of the
application you give it
to it. Also as far as I have read, you need to have your whole
project
structure as sliders expectation so that it can mange it by
moving the code
to containers. Since a regular deployment of Apex just consist
of moving an
.apa to some directory on any node in cluster (preferably edge
node) and
then launching the application through Apex CLI using launch,
we are not
sure as to what will change in terms of handling this task to
slider

I am looking for some guidelines or possible sample
implementation reference
that can give headstart

While searching, found this below proposal about Apex which
does says that
/'Apache Slider is a YARN application to deploy existing
distributed
applications on YARN, monitor them, and make them larger or
smaller as
desired even when the application is running. Once Slider
matures, we will
take a look at close integration of Apex with Slider.' /

which made me excited about using slider :)

Regards
Vivek



--
View this message in context:

http://apache-apex-users-list.78494.x6.nabble.com/Apache-Apex-with-Apache-slider-tp1802p1806.html


Sent from the Apache Apex Users list mailing list archive at
Nabble.com.







Re: Processing speed units

2017-06-16 Thread Vlad Rozov
Questions related to a vendor specific tool do not belong to the list. 
Please check directly with the vendor.


Thank you,

Vlad

On 6/15/17 22:53, Anushree Gupta wrote:

Hi All

On Data Torrent console the processing speed of an application is given in a
format like 'x processed/s' or 'y emitted/s' where x and y are the counts of
tuples or records processed/ emitted per second. I have a requirement where
I need it to be in 'kb/s' units. Is there some way of converting it within
DataTorrent console or an operator?

Thanks in advance,
Anushree



--
View this message in context: 
http://apache-apex-users-list.78494.x6.nabble.com/Processing-speed-units-tp1737.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.




Re: Is there a way to schedule an operator?

2017-06-15 Thread Vlad Rozov
I agree. IMO, application scheduling is not part of a streaming engine 
functionality and there are plenty of other projects that can help with 
it. A streaming engine whether in batch or streaming use case needs to 
support


- watermarks and triggers (with few exceptions mostly supported by Apex)
- effective resource utilization (contributors to help with the 
functionality are welcome)


Thank you,

Vlad

On 6/14/17 07:32, Amol Kekre wrote:


The only thing missing is to kick off a job, in case the ask is to use 
resources the batch way "use and terminate once done". An operator 
that keeps an eye and has ability to kick off a job suffices. Kicking 
off a batch job can be done via any of the following


1. Files
   -> Start post all data arrival. Usually a .done file in a dir, 
which triggers entire dir to be processed

   -> Start asap and end on .done
2. Message (a start message)

I think batch use cases are mainly #1. This technically is not a batch 
vs stream use case, just a scheduler (Oozie like) part of batch.


Thks
Amol


/
/

E:a...@datatorrent.com  | M: 
510-449-2606 | Twitter: @/amolhkekre/


www.datatorrent.com 


On Tue, Jun 13, 2017 at 11:47 PM, Ganelin, Ilya 
> wrote:


I think it's a very relevant use case. In the Apex formulation
this would work as follows. An operator runs continuously and
maintains an internal state that tracks process files or an offset
(e.g. In Kafka). As more data becomes available, the operator
performs the appropriate operation and then returns to waiting. In
this fashion, batched data is processed as soon as it becomes
available but the process overall is still a batch process since
it's limited by the production of the source batches.

There are a couple of examples of this in Malhar, for example the
AbstractFileInputOperator.

Your earlier comment with regards to your motivation is
interesting. Can you elaborate on the load reduction you get with
your approach? A number of batched small writes to a DB may prove
to be more efficient from a latency or database utilization
standpoint when compared with infrequent large batch writes
particularly if they involve index updates.





*From:* dashi...@yahoo.com 
>
*Sent:* Tuesday, June 13, 2017 6:36:29 PM
*To:* guilhermeh...@gmail.com ;
users@apex.apache.org 
*Subject:* Re: Is there a way to schedule an operator?
I have input operators that reach out to Google, Facebook, Bing,
Yahoo etc. once a day or an hour and download marketing spend
statistics. Apex promises batch and streaming to be equal class
citizens. How is this equality achieved if there's no scheduler
for batch jobs to rely on? If want the dag to take data stream
from batch pipeline and affect streaming pipelines running
alongside. Do you not see this as a valid use case?

Sent from Yahoo Mail on Android


On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
> wrote:
Hi guys,

Is there a way to schedule an operator? I need an
operator start the DAG once a day at 00am.

Best

-- 
*Guilherme Hott*

/Software Engineer/
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott





The information contained in this e-mail is confidential and/or
proprietary to Capital One and/or its affiliates and may only be
used solely in performance of work or services for Capital One.
The information transmitted herewith is intended only for use by
the individual or entity to which it is addressed. If the reader
of this message is not the intended recipient, you are hereby
notified that any review, retransmission, dissemination,
distribution, copying or other use of, or taking of any action in
reliance upon this information is strictly prohibited. If you have
received this communication in error, please contact the sender
and delete the material from your computer.






Re: Is there a way to schedule an operator?

2017-06-14 Thread Vlad Rozov
Use Apex client to start application or use REST API to start 
application if you have DataTorrent RTS license from you cron job.


Thank you,

Vlad

On 6/13/17 15:30, Guilherme Hott wrote:
I was thinking about to use a cron sending a kafka message and in my 
DAG I'll have a kafka input operator to consume this message and start 
the process. I think this work but I would like to know if have 
something more appropriate.


On Tue, Jun 13, 2017 at 3:25 PM, Pramod Immaneni 
> wrote:


There is no built scheduler to schedule the DAGs at a prescribed
time, you would need to use some external mechanisms. Because it
is a daily one-time activity, would something like cron work for you?

On Tue, Jun 13, 2017 at 3:22 PM, Guilherme Hott
> wrote:

Because I am syncing my data from a table in a database to
HDFS and I want to do this just once a day to save processing use.

On Tue, Jun 13, 2017 at 2:45 PM, Ganelin, Ilya
> wrote:

Why don’t you want your dag to continue running? Are there
resources you wish to release?

- Ilya Ganelin

id:image001.png@01D1F7A4.F3D42980

*From: *Guilherme Hott >
*Reply-To: *"users@apex.apache.org
" >
*Date: *Tuesday, June 13, 2017 at 2:29 PM
*To: *"users@apex.apache.org
" >
*Subject: *Is there a way to schedule an operator?

Hi guys,

Is there a way to schedule an operator? I need an
operator start the DAG once a day at 00am.

Best

-- 


*Guilherme Hott*

/Software Engineer/

Skype: guilhermehott

@guilhermehott

https://www.linkedin.com/in/guilhermehott






The information contained in this e-mail is confidential
and/or proprietary to Capital One and/or its affiliates
and may only be used solely in performance of work or
services for Capital One. The information transmitted
herewith is intended only for use by the individual or
entity to which it is addressed. If the reader of this
message is not the intended recipient, you are hereby
notified that any review, retransmission, dissemination,
distribution, copying or other use of, or taking of any
action in reliance upon this information is strictly
prohibited. If you have received this communication in
error, please contact the sender and delete the material
from your computer.




-- 
*Guilherme Hott*

/Software Engineer/
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott






--
*Guilherme Hott*
/Software Engineer/
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott





Re: Hdfs + apex-core

2017-06-06 Thread Vlad Rozov
Do you have partition.assignment.strategy set anywhere in the Kafka 
operator properties?


Thank you,

Vlad

On 6/5/17 10:14, userguy wrote:

Did any one got chance to look at  the code ..

or do we have template to read from multiple topics in kafka ???



--
View this message in context: 
http://apache-apex-users-list.78494.x6.nabble.com/Fwd-Hdfs-apex-core-tp1608p1677.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.




Re: Container restart after operator failure

2017-06-06 Thread Vlad Rozov
It is a portion of the application master log. In the same log, check 
what container fails first and check that container for errors and 
exceptions.


apex/malhar developers, any volunteer to see why the operator fails 
during recovery?


Thank you,

Vlad

On 6/5/17 23:17, Priyanka Gugale wrote:
Can you share your entire log file. Why your operator got killed for 
first time? The above mentioned error seems to be at recovery time.


-Priyanka

On Tue, Jun 6, 2017 at 12:44 AM, Guilherme Hott 
> wrote:


Hi, I have this error and I don't know why it's happening. The
operator who is failing is processing a tuple, doing a
dedup check, saving into HBase if it's new or update and emiting
to the stream. But, because of this, only a few tuples are
processed due to the failure.

2017-06-04 06:43:45,265
[org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl
#6] INFO  impl.ContainerManagementProtocolProxy newProxy -
Opening proxy : localhost:8052

2017-06-04 06:43:47,573 [IPC Server handler 0 on 38848]
INFO  stram.StreamingContainerParent log - child msg:
[container_1496564390452_0002_01_10] Entering
heartbeat loop.. context:

PTContainer[id=1(container_1496564390452_0002_01_10),state=ALLOCATED,operators=[PTOperator[id=6,name=ConsoleNew],
PTOperator[id=7,name=ConsoleBad],
PTOperator[id=1,name=cloutApiBanksInput],
PTOperator[id=5,name=banksDeduplicator],
PTOperator[id=10,name=ConsoleNewJDBC],
PTOperator[id=11,name=ConsoleErrorJDBC],
PTOperator[id=9,name=cloutApiBanksOutput],
PTOperator[id=8,name=banksOpLoadObject],
PTOperator[id=3,name=Deduper],
PTOperator[id=4,name=cloutApiBanksInput.outputPort#unifier],
PTOperator[id=2,name=cloutApiBanksInput]]]

2017-06-04 06:43:48,587 [IPC Server handler 1 on 38848]
INFO  stram.StreamingContainerManager processHeartbeat -
Container container_1496564390452_0002_01_10 buffer
server: datatorrent-sandbox:35304

2017-06-04 06:43:56,262 [IPC Server handler 16 on 38848]
INFO  stram.StreamingContainerParent log - child msg:
Stopped running due to an exception.
java.lang.NullPointerException

at

com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)

at

org.apache.apex.malhar.lib.wal.FSWindowDataManager.retrieve(FSWindowDataManager.java:487)

at

org.apache.apex.malhar.lib.wal.FSWindowDataManager.retrieve(FSWindowDataManager.java:448)

at

com.datatorrent.lib.db.jdbc.AbstractJdbcPollInputOperator.replay(AbstractJdbcPollInputOperator.java:316)

at

com.datatorrent.lib.db.jdbc.AbstractJdbcPollInputOperator.beginWindow(AbstractJdbcPollInputOperator.java:203)

at
com.datatorrent.stram.engine.InputNode.run(InputNode.java:122)

at

com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1441)

 context:

PTContainer[id=1(container_1496564390452_0002_01_10),state=ACTIVE,operators=[PTOperator[id=6,name=ConsoleNew],
PTOperator[id=7,name=ConsoleBad],
PTOperator[id=1,name=cloutApiBanksInput],
PTOperator[id=5,name=banksDeduplicator],
PTOperator[id=10,name=ConsoleNewJDBC],
PTOperator[id=11,name=ConsoleErrorJDBC],
PTOperator[id=9,name=cloutApiBanksOutput],
PTOperator[id=8,name=banksOpLoadObject],
PTOperator[id=3,name=Deduper],
PTOperator[id=4,name=cloutApiBanksInput.outputPort#unifier],
PTOperator[id=2,name=cloutApiBanksInput]]]

2017-06-04 06:43:56,683 [IPC Server handler 5 on 38848]
WARN  stram.StreamingContainerManager
processOperatorFailure - Operator failure:
PTOperator[id=2,name=cloutApiBanksInput] count: 6

2017-06-04 06:43:56,683 [IPC Server handler 5 on 38848]
ERROR stram.StreamingContainerManager
processOperatorFailure - Initiating container restart
after operator failure
PTOperator[id=2,name=cloutApiBanksInput]

2017-06-04 06:43:57,292 [main] INFO
 stram.StreamingAppMasterService sendContainerAskToRM -
Requested stop container
container_1496564390452_0002_01_10

2017-06-04 06:43:57,292
[org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl
#7] INFO  impl.NMClientAsyncImpl run - Processing Event
EventType: STOP_CONTAINER for Container
container_1496564390452_0002_01_10

2017-06-04 

Re: To run Apex runner of Apache Beam

2017-06-05 Thread Vlad Rozov
I guess that TfIdf example and/or apex runner can't handle local file 
system. Try using default input.


java.lang.RuntimeException: java.io.FileNotFoundException: 
/Users/vrozov/Projects/Apache/beam/examples (Is a directory)
at 
org.apache.beam.runners.apex.translation.operators.ApexReadUnboundedInputOperator.setup(ApexReadUnboundedInputOperator.java:127)
at 
org.apache.beam.runners.apex.translation.operators.ApexReadUnboundedInputOperator.setup(ApexReadUnboundedInputOperator.java:47)
at com.datatorrent.stram.engine.Node.setup(Node.java:188)
at 
com.datatorrent.stram.engine.StreamingContainer.setupNode(StreamingContainer.java:1337)
at 
com.datatorrent.stram.engine.StreamingContainer.access$100(StreamingContainer.java:129)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1413)
Caused by: java.io.FileNotFoundException: 
/Users/vrozov/Projects/Apache/beam/examples (Is a directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:90)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:55)
at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:221)
at 
org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:457)
at 
org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:271)
at 
org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.advance(UnboundedReadFromBoundedSource.java:466)
at 
org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.access$300(UnboundedReadFromBoundedSource.java:444)
at 
org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.advance(UnboundedReadFromBoundedSource.java:295)
at 
org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.start(UnboundedReadFromBoundedSource.java:288)
at 
org.apache.beam.runners.apex.translation.operators.ApexReadUnboundedInputOperator.setup(ApexReadUnboundedInputOperator.java:125)
... 5 more


Thank you,

Vlad

On 6/2/17 18:21, Thomas Weise wrote:
It may also be helpful to enable logging to see if there are any 
exceptions in the execution. The console log only contains output from 
Maven. Is logging turned off (or a suitable slf4j backend missing)?



On Fri, Jun 2, 2017 at 5:00 PM, Thomas Weise <t...@apache.org 
<mailto:t...@apache.org>> wrote:


Hi Claire,

The stack traces suggest that the application was launched and the
Apex DAG is deployed and might be running (it runs in embedded mode).

Do you see any output? In case it writes to files, anything in
output or output staging directories?

Thanks,
Thomas


On Thu, Jun 1, 2017 at 3:18 PM, Claire Yuan
<clairey...@yahoo-inc.com <mailto:clairey...@yahoo-inc.com>> wrote:

Hi Vlad,
 Thanks for replying! Here attached my log and stack trace


On Thursday, June 1, 2017 1:31 PM, Vlad Rozov
<v.ro...@datatorrent.com <mailto:v.ro...@datatorrent.com>> wrote:


Hi Claire,

Can you provide maven logs? It may also help to obtain stack
trace. In a separate terminal window, run "jps -lv" and look
for the jvm process id that executes
org.apache.beam.examples.TfIdf. Use jstack  to get
the stack trace.

Thank you,

Vlad

On 6/1/17 13:00, Claire Yuan wrote:

Hi,
I am using the Apex runner to execute Apache Beam example
pipeline. However my terminal always get frozen when running
the command:
|mvn compile exec:java -Dexec.mainClass=org.apache.be
<http://org.apache.be>am.examples.TfIdf \
-Dexec.args="--inputFile=pom.xml --output=counts
--runner=ApexRunner" -Papex-runner|
Would anyone have solution for this?

Claire










Re: To run Apex runner of Apache Beam

2017-06-01 Thread Vlad Rozov

Hi Claire,

Can you provide maven logs? It may also help to obtain stack trace. In a 
separate terminal window, run "jps -lv" and look for the jvm process id 
that executes org.apache.beam.examples.TfIdf. Use jstack  to 
get the stack trace.


Thank you,

Vlad

On 6/1/17 13:00, Claire Yuan wrote:

Hi,
  I am using the Apex runner to execute Apache Beam example pipeline. 
However my terminal always get frozen when running the command:
|mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.TfIdf 
\ -Dexec.args="--inputFile=pom.xml --output=counts 
--runner=ApexRunner" -Papex-runner|

  Would anyone have solution for this?

Claire




Re: Container failure without relaunch

2017-05-31 Thread Vlad Rozov
It may also help to enable DEBUG level logging for com.datatorrent.* 
once the issue is reproduced again and check activity in the application 
master logs.


Thank you,

Vlad

On 5/30/17 10:41, Sandesh Hegde wrote:
When that issue happens, please check the free resource(CPU and 
memory) available for Yarn.


On Tue, May 30, 2017 at 10:36 AM Ganelin, Ilya 
> wrote:


I think I checked this and I don’t see any activity whatsoever. No
re-launch, just empty tabs. I’ll try to provide a screenshot next
time it happens.

- Ilya Ganelin

id:image001.png@01D1F7A4.F3D42980

*From: *Pramod Immaneni >
*Reply-To: *"users@apex.apache.org "
>
*Date: *Tuesday, May 30, 2017 at 10:17 AM
*To: *"users@apex.apache.org "
>
*Cc: *DataTorrent Users Group >
*Subject: *Re: Container failure without relaunch

Hi Ilya,

What is the state of the physical containers in the physical
tab. Are the containers dying and continuously restarting.

Thanks

On Tue, May 30, 2017 at 10:11 AM, Ganelin, Ilya
>
wrote:

Hi all – several times now I’ve noticed odd behavior with our
app. When running for several days or more, I’ll observe that
following an operator failure, the container does not
relaunch. I’m not sure what accounts for this, I don’t see any
further errors in the log following the initial “stop” +
“operator remove, it’s as if recovery is not working. Any
thoughts on what could be causing this?

- Ilya Ganelin



The information contained in this e-mail is confidential
and/or proprietary to Capital One and/or its affiliates and
may only be used solely in performance of work or services for
Capital One. The information transmitted herewith is intended
only for use by the individual or entity to which it is
addressed. If the reader of this message is not the intended
recipient, you are hereby notified that any review,
retransmission, dissemination, distribution, copying or other
use of, or taking of any action in reliance upon this
information is strictly prohibited. If you have received this
communication in error, please contact the sender and delete
the material from your computer.




The information contained in this e-mail is confidential and/or
proprietary to Capital One and/or its affiliates and may only be
used solely in performance of work or services for Capital One.
The information transmitted herewith is intended only for use by
the individual or entity to which it is addressed. If the reader
of this message is not the intended recipient, you are hereby
notified that any review, retransmission, dissemination,
distribution, copying or other use of, or taking of any action in
reliance upon this information is strictly prohibited. If you have
received this communication in error, please contact the sender
and delete the material from your computer.





Re: Exceeded allowed memory block for AvroFileInputOperator

2017-05-30 Thread Vlad Rozov
The warning means that the downstream operator(s) is not capable to keep 
up with the upstream operator and tuples are accumulated in the upstream 
buffer server leading to the increased latency. If the latency is within 
SLA, you may ignore the warning, otherwise try to partition downstream 
or decrease emit rate.


Thank you,

Vlad

On 5/26/17 07:17, Vivek Bhide wrote:

I have a AvroFileInputOperator which has large no of files to be read when
the application starts. While the fire reading goes well at the start, I
start getting below warning in log after reading around 50+ files

Is there something to be worried about and is there a way to mitigate it?

2017-05-26 09:10:36,233 INFO  fs.AbstractFileInputOperator
(AbstractFileInputOperator.java:openFile(796)) - opening file
hdfs:/partition_date=2017-05-05/00_0_copy_28
2017-05-26 09:11:32,145 WARN  internal.DataList (DataList.java:run(743)) -
Exceeded allowed memory block allocation by 1
2017-05-26 09:11:58,595 WARN  internal.DataList (DataList.java:run(743)) -
Exceeded allowed memory block allocation by 1

Regards
Vivek



--
View this message in context: 
http://apache-apex-users-list.78494.x6.nabble.com/Exceeded-allowed-memory-block-for-AvroFileInputOperator-tp1651.html
Sent from the Apache Apex Users list mailing list archive at Nabble.com.




Re: Error while using AvroToPojo operator

2017-05-10 Thread Vlad Rozov
Do not use janino commons-compiler-jdk, it has a bug that prevents it 
from properly working with PojoUtils. Replace it with janino 
commons-compiler.


Thank you,

Vlad

On 5/10/17 13:53, bhidevivek wrote:

I am trying to read the hdfs audit logs from avro format and trying to
convert it to POJO using the combination of AvroFileInputOperator and
AvroToPojo but the application launch fails with below error
I have verified that property TUPLE_CLASS for AvroToPojo operator has set to
correct POJO class from project and it has all the required public fields
with getter and setter. Is there anything that I am missing here?

2017-05-10 13:31:28,823 DEBUG com.datatorrent.stram.FSEventRecorder: Adding
event OperatorError to the queue
2017-05-10 13:31:28,823 INFO com.datatorrent.stram.StreamingContainerParent:
child msg: Abandoning deployment due to setup failure.
java.lang.RuntimeException: org.codehaus.commons.compiler.CompileException:
Line 1, Column 0: cannot find symbol
   symbol:   class PojoUtils$Setter
   location: package com.datatorrent.lib.util
(compiler.err.cant.resolve.location)
at 
com.datatorrent.lib.util.PojoUtils.compileExpression(PojoUtils.java:778)
at 
com.datatorrent.lib.util.PojoUtils.compileExpression(PojoUtils.java:746)
at com.datatorrent.lib.util.PojoUtils.createSetter(PojoUtils.java:684)
at com.datatorrent.lib.util.PojoUtils.createSetter(PojoUtils.java:442)
at com.datatorrent.lib.util.PojoUtils.createSetter(PojoUtils.java:432)
at
com.tgt.dqs.operator.AvroToPojoCustom.initializeActiveFieldSetters(AvroToPojoCustom.java:400)
at
com.tgt.dqs.operator.AvroToPojoCustom.access$700(AvroToPojoCustom.java:69)
at 
com.tgt.dqs.operator.AvroToPojoCustom$2.setup(AvroToPojoCustom.java:290)
at 
com.tgt.dqs.operator.AvroToPojoCustom$2.setup(AvroToPojoCustom.java:272)
at
com.datatorrent.stram.engine.StreamingContainer.setupNode(StreamingContainer.java:1357)
at
com.datatorrent.stram.engine.StreamingContainer.access$100(StreamingContainer.java:129)
at
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1413)
Caused by: org.codehaus.commons.compiler.CompileException: Line 1, Column 0:
cannot find symbol
   symbol:   class PojoUtils$Setter
   location: package com.datatorrent.lib.util
(compiler.err.cant.resolve.location)
at
org.codehaus.commons.compiler.jdk.SimpleCompiler$3.report(SimpleCompiler.java:322)
at
com.sun.tools.javac.api.ClientCodeWrapper$WrappedDiagnosticListener.report(ClientCodeWrapper.java:593)
at com.sun.tools.javac.util.Log.writeDiagnostic(Log.java:616)
at
com.sun.tools.javac.util.Log$DefaultDiagnosticHandler.report(Log.java:600)
at com.sun.tools.javac.util.Log.report(Log.java:562)
at com.sun.tools.javac.comp.Resolve.logResolveError(Resolve.java:3514)
at com.sun.tools.javac.comp.Resolve.accessInternal(Resolve.java:2219)
at com.sun.tools.javac.comp.Resolve.accessBase(Resolve.java:2262)
at com.sun.tools.javac.comp.Attr.selectSym(Attr.java:3390)
at com.sun.tools.javac.comp.Attr.visitSelect(Attr.java:3278)
at 
com.sun.tools.javac.tree.JCTree$JCFieldAccess.accept(JCTree.java:1897)
at com.sun.tools.javac.comp.Attr.attribTree(Attr.java:576)
at com.sun.tools.javac.comp.Attr.attribType(Attr.java:638)
at com.sun.tools.javac.comp.Attr.attribType(Attr.java:631)
at com.sun.tools.javac.comp.Attr.attribBase(Attr.java:786)
at com.sun.tools.javac.comp.MemberEnter.complete(MemberEnter.java:1072)
at com.sun.tools.javac.code.Symbol.complete(Symbol.java:574)
at 
com.sun.tools.javac.code.Symbol$ClassSymbol.complete(Symbol.java:1037)
at com.sun.tools.javac.comp.Enter.complete(Enter.java:493)
at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
at 
com.sun.tools.javac.main.JavaCompiler.enterTrees(JavaCompiler.java:982)
at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:857)
at com.sun.tools.javac.main.Main.compile(Main.java:523)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:129)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:138)
at
org.codehaus.commons.compiler.jdk.SimpleCompiler.cook(SimpleCompiler.java:353)
at
org.codehaus.commons.compiler.jdk.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:197)
at
org.codehaus.commons.compiler.jdk.ScriptEvaluator.cook(ScriptEvaluator.java:426)
at
org.codehaus.commons.compiler.jdk.ScriptEvaluator.cook(ScriptEvaluator.java:323)
at
org.codehaus.commons.compiler.jdk.ScriptEvaluator.cook(ScriptEvaluator.java:273)
at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:50)
at
org.codehaus.commons.compiler.jdk.ScriptEvaluator.createFastEvaluator(ScriptEvaluator.java:566)
at

Re: Apex + DT RTS

2017-05-10 Thread Vlad Rozov
No, it does not mean that you have to use binaries from a vendor like 
DataTorrent or BigTop. You may still build them yourself.


Thank you,

Vlad

On 5/10/17 15:57, apex user wrote:

So, does it mean that I cannot use Apex without opting for binaries from 
various vendors like DataTorrent or BigTop?

Thanks
On 2017-05-10 15:54 (-0700), Vlad Rozov <v.ro...@datatorrent.com> wrote:

It is a source release download, not a binary download. You will need to
build binaries. Also, documentation assumes an installation from a
binary download.

Thank you,

Vlad

On 5/10/17 15:46, apex user wrote:

I downloaded from https://apex.apache.org/downloads.html, and Apex Source 
Releases - 3.6.0(apex-3.6.0-source-release.zip)

Thanks

On 2017-05-10 15:44 (-0700), Vlad Rozov <v.ro...@datatorrent.com> wrote:

Which download do you use?

Thank you,

Vlad

On 5/10/17 14:43, apex user wrote:

Hi All,

Can I work with Apache Apex independently without installing DataTorrent RTS 
binaries? I see from the Apache Apex Documentation, I can access Apex CLI from 
/bin/apex. But, when I download it from Apache Apex 
Downloads I don't see any /bin folder. Is there something I am missing here?

Thank you.






Re: Apex + DT RTS

2017-05-10 Thread Vlad Rozov
It is a source release download, not a binary download. You will need to 
build binaries. Also, documentation assumes an installation from a 
binary download.


Thank you,

Vlad

On 5/10/17 15:46, apex user wrote:

I downloaded from https://apex.apache.org/downloads.html, and Apex Source 
Releases - 3.6.0(apex-3.6.0-source-release.zip)

Thanks

On 2017-05-10 15:44 (-0700), Vlad Rozov <v.ro...@datatorrent.com> wrote:

Which download do you use?

Thank you,

Vlad

On 5/10/17 14:43, apex user wrote:

Hi All,

Can I work with Apache Apex independently without installing DataTorrent RTS 
binaries? I see from the Apache Apex Documentation, I can access Apex CLI from 
/bin/apex. But, when I download it from Apache Apex 
Downloads I don't see any /bin folder. Is there something I am missing here?

Thank you.






Re: Apex + DT RTS

2017-05-10 Thread Vlad Rozov

Which download do you use?

Thank you,

Vlad

On 5/10/17 14:43, apex user wrote:

Hi All,

Can I work with Apache Apex independently without installing DataTorrent RTS 
binaries? I see from the Apache Apex Documentation, I can access Apex CLI from 
/bin/apex. But, when I download it from Apache Apex 
Downloads I don't see any /bin folder. Is there something I am missing here?

Thank you.




Re: Window id got stuck

2017-05-09 Thread Vlad Rozov

Sorry, I mean upstream. Can you also provide Apex version.

Thank you,

Vlad

On 5/8/17 21:57, chiranjeevi vasupilli wrote:
In the killed container we have one writer operator to HDFS and 
default Unifier. The writer operator receives data from other upstream 
operators(32 partitions) running in a separate containers. There is no 
functional processing happening in the blocked operators.


There is no downstream operators.

Please suggest.



On Mon, May 8, 2017 at 8:58 PM, Vlad Rozov <v.ro...@datatorrent.com 
<mailto:v.ro...@datatorrent.com>> wrote:


The exception below means that a downstream operator abruptly
disconnected. It is not an indication of a problem by itself.
Please check downstream operator container log for exceptions and
error messages.

Thank you,

Vlad

On 5/8/17 07:12, Pramod Immaneni wrote:

Hi Chiranjeevi,

I am assuming the operator that is upstream and feeding data to
this one is progressing properly. What is this operator doing? Is
it doing any blocking operations, for example, communicating with
some external systems in a blocking fashion? Do you see any other
exceptions before the above one you mentioned?

Thanks

On Mon, May 8, 2017 at 3:41 AM, chiranjeevi vasupilli
<chiru@gmail.com <mailto:chiru@gmail.com>> wrote:

Hi Team,

In my use case , one of the operator window id got stuck and
after timeout it is getting killed.
In the logs we can see
[ProcessWideEventLoop] ERROR
netlet.AbstractLengthPrependerClient handleException -
Disconnecting
Subscriber{id=tcp://hostname:57968/323.rsnOutput.1} because
of an exception.
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method).

Before container getting , we see the above exception in the
logs.

Please suggest , reasons for window id getting stuck and how
to debug it further.

Thanks
Chiranjeevi V







--
ur's
chiru




Re: Window id got stuck

2017-05-08 Thread Vlad Rozov
The exception below means that a downstream operator abruptly 
disconnected. It is not an indication of a problem by itself. Please 
check downstream operator container log for exceptions and error messages.


Thank you,

Vlad

On 5/8/17 07:12, Pramod Immaneni wrote:

Hi Chiranjeevi,

I am assuming the operator that is upstream and feeding data to this 
one is progressing properly. What is this operator doing? Is it doing 
any blocking operations, for example, communicating with some external 
systems in a blocking fashion? Do you see any other exceptions before 
the above one you mentioned?


Thanks

On Mon, May 8, 2017 at 3:41 AM, chiranjeevi vasupilli 
> wrote:


Hi Team,

In my use case , one of the operator window id got stuck and after
timeout it is getting killed.
In the logs we can see
[ProcessWideEventLoop] ERROR netlet.AbstractLengthPrependerClient
handleException - Disconnecting
Subscriber{id=tcp://hostname:57968/323.rsnOutput.1} because of an
exception.
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method).

Before container getting , we see the above exception in the logs.

Please suggest , reasons for window id getting stuck and how to
debug it further.

Thanks
Chiranjeevi V






Re: One Year Anniversary of Apache Apex

2017-04-25 Thread Vlad Rozov

Congratulations to the community!

Thank you,

Vlad

On 4/25/17 08:14, Thomas Weise wrote:

Hi,

It's been one year for Apache Apex as top level project, congratulations to
the community!

I wrote this blog to reflect and look ahead:

http://www.atrato.io/blog/2017/04/25/one-year-apex/

Your comments and suggestions are welcome.

Thanks,
Thomas





Re: DB connection getting lost

2017-04-18 Thread Vlad Rozov
Do you call JdbcStore from an operator thread or do you share it across 
multiple threads? In the later case the behavior is undefined.


Thank you,

Vlad

On 4/17/17 02:10, Priyanka Gugale wrote:

Can you make sure you have called connect first before createStatement.
Also if possible please share code where you do make connection and 
create statement in operator.


-Priyanka

On Mon, Apr 17, 2017 at 2:22 PM, chiranjeevi vasupilli 
> wrote:


We have utility class from where we are connecting to DB
using com.datatorrent.lib.db.jdbc.JdbcStore.

On Mon, Apr 17, 2017 at 1:50 PM, Priyanka Gugale
> wrote:

Have you written any custom operator or you are
using JdbcInputOperator from Malhar?

-Priyanka

On Mon, Apr 17, 2017 at 1:27 PM, chiranjeevi vasupilli
> wrote:

Hi Priyanka,

In logs we can see NullPointer exception while
preparing createStatment .

I have verified after calling connect() method of
JdbcStore, i have called store.getConnection to print the
connection object. But when i use the connection to
prepare the createStatment we are getting NullPointer
exception.

We are not sure , why it is becoming null or we received
any dead connection.

Please suggest.

Thanks
Chiranjeevi V

On Mon, Apr 17, 2017 at 12:19 PM, Priyanka Gugale
> wrote:

Do you see any exceptions in your log file? If yes
please share.
Also you mentioned "connection getting established" -
how do you verify that?

-Priyanka

On Thu, Apr 13, 2017 at 3:17 PM, chiranjeevi vasupilli
> wrote:

Hi Team,

In my use case , im using JdbcStore from Malhar
lib to get oracle connection . But im getting some
strange behavior. Connection getting established
and when i call *store.getConnection* it is
returning null. Due to this the containers getting
killed in my App.

The established connection became dead and unable
query, Please suggest how to handle such scenario.

Thanks
Chiranjeevi V






-- 
ur's

chiru





-- 
ur's

chiru






Re: Restricting emit speed of AbstractFileInputOperator.

2017-04-10 Thread Vlad Rozov
If it takes t to emit one tuple and t*x < 0.5 sec, emitBatchSize can be 
set to x. This will help with CPU utilization as platform will put the 
operator thread to sleep if it does not emit and does not have 
handleIdleTime() callback defined.


Thank you,

Vlad

On 4/7/17 18:53, Bhupesh Chawda wrote:

I think the understanding is wrong.

Platform calls emitTuples multiple times in a window. This number is 
unknown; it depends on the window time. We can limit this to x.


emitBatchSize controls the number of tuples emitted in one such call. 
Set this to 1.


This should result in at most x tuples per window. Note that it can be 
less than x as well.


~ Bhupesh




On Apr 8, 2017 01:56, "Ambarish Pande" > wrote:


Hello Bhupesh Sir,
But does that mean I am emitting only 'x' lines from the file?.
Because from what I understood, emitTuples() emits multiple lines
in a single call and emitBatchSize controls number of times
emitTuples is called in a window. Am I right?. I inferred this
from the following

The platform invokes the |emitTuples()| callback multiple time
in each streaming window; within a single such call, if a
large number of tuples are emitted,there is some risk that
they may overwhelm the downstream operators especially if they
are performing some compute intensive operation.


Thank You.

On Fri, Apr 7, 2017 at 1:44 PM, Bhupesh Chawda
> wrote:

You can set emitBatchSize to 1 and make sure emitTuples is
called just 'x' times within a window. You can do this
manually by keeping a count and resetting it in beginWindow().

~ Bhupesh

___

Bhupesh Chawda

E: bhup...@datatorrent.com |
Twitter: @bhupeshsc

www.datatorrent.com   |
apex.apache.org 

On Fri, Apr 7, 2017 at 1:38 PM, Ambarish Pande
> wrote:

Yes i tried. That just gives me control on how many times
emitTuples is called. I want control on number of tuples
emitted.
Thank you.
Sent from my iPhone
On 07-Apr-2017, at 8:08 AM, Yogi Devendra
> wrote:

Have you tried /emitBatchSize /as mentioned
https://apex.apache.org/docs/malhar/operators/fsInputOperator/

~ Yogi
On 3 April 2017 at 00:05, Ambarish Pande
> wrote:

Hi,
How can i make the AbstractFileInputOperator emit
only 'x' number of lines per window. Is there a hook
for that. Or i have to do it manually?
Thank You.



Re: Blocked Operator

2017-03-07 Thread Vlad Rozov
Please check stack trace of the thread where an operator runs. The main 
thread in a container is responsible for heartbeat monitoring and what 
you see in the stack trace is a wait between heartbeats.


Thank you,

Vlad

/Join us at Apex Big Data World-San Jose 
, April 4, 2017/
http://www.apexbigdata.com/san-jose-register.html 


On 3/7/17 00:57, chiranjeevi vasupilli wrote:

Hi Team,

In my application the reader operator window id got stuck and it is 
not moved further. When we check the stack Trace we found the below 
*TIMED_WAITING (on object monitor) *and container got killed* after 
TIMEOUT_WINDOW_COUNT*.


Can you please tell me why the operator got stuck and unable to 
proceed further.



"main" prio=10 tid=0x7f1f8801d800 nid=0x11c1 in Object.wait() 
[0x7f1f8f0c6000]

   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x7f112eb65360> (a java.lang.Object)
at 
com.datatorrent.stram.engine*.StreamingContainer.heartbeatLoop(StreamingContainer.java:610)*

- locked <0x7f112eb65360> (a java.lang.Object)
at 
com.datatorrent.stram.engine.StreamingContainer.main(StreamingContainer.java:280)


Let me know if you need further information.

--
ur's
chiru




Re: Occasional Out of order tuples when emitting from a thread

2017-02-21 Thread Vlad Rozov
Please see Threads section at 
http://apex.apache.org/docs/apex/development_best_practices/


Note that an operator has only one thread created by the Apex that it 
can use to interact with the system. Starting with 3.5.0 release, this 
is enforced by the platform and it will throw an exception if an 
operator tries to emit on a different thread.


Thank you,

Vlad

/Join us at Apex Big Data World-San Jose 
, April 4, 2017/
http://www.apexbigdata.com/san-jose-register.html 


On 2/21/17 11:54, Munagala Ramanath wrote:
To amplify Sandesh's answer a bit, the main operator thread invokes 
user callbacks like beginWindow(), endWindow(), process() method of 
input ports, and emitTuples() in input operators.


Additionally, if the operator implements the IdleTimeHandler, and if 
the operator is deemed to be idle, the handleIdleTime() callback will 
be called. All tuple output must be done in one of these callbacks.


So you can check the thread-safe queue in any of these callbacks and 
emit output tuples as needed.


Ram

On Tue, Feb 21, 2017 at 11:42 AM, Sandesh Hegde 
> wrote:


-Removed dev@

Operators can implement idle Time Handler.

https://www.datatorrent.com/docs/apidocs/com/datatorrent/api/Operator.IdleTimeHandler.html



On Tue, Feb 21, 2017 at 11:33 AM Sunil Parmar
> wrote:

Ram,
Thanks for the prompt response. If we use the approach you
suggested we’re dependent on main thread’s process call I.e.
Tuples in the thread safe queue gets only processed when main
thread is processing incoming tuples. How can we explicitly
call the process from polling of delay queue ?

Just for reference here’s the sample code snippet for our
operator.

public class MyOperator extends BaseOperator implements

Operator.ActivationListener {

…..

@InputPortFieldAnnotation

public transient DefaultInputPort kafkaStreamInput =

new DefaultInputPort() {

List errors = new ArrayList();

@Override

public void process(String consumerRecord) {

//Code for normal tuple process

//Code to poll thread safe queue

}


***—*
*From: *Munagala Ramanath >
*To: *users@apex.apache.org 
*CC: *"d...@apex.apache.org "
>, Allan De
Leon >, Tim Zhu
>
*Subject: *Re: Occasional Out of order tuples when emitting
from a thread
*Date: *2017-02-21 10:08 (-0800)
*List: *users@apex.apache.org


Please note that tuples should not be emitted by any thread other than 
the
main operator thread.

A common pattern is to use a thread-safe queue and have worker threads
enqueue
tuples there; the main operator thread then pulls tuples from the queue 
and
emits them.

Ram

___

Munagala V. Ramanath

Software Engineer

E:r...@datatorrent.com   | M:(408) 331-5034 
  | Twitter: @UnknownRam

www.datatorrent.com    |apex.apache.org   


From: Sunil Parmar > Date: Tuesday, February 21,
2017 at 10:05 AM To: "users@apex.apache.org
" >, "d...@apex.apache.org
" > Cc: Allan De Leon
>,
Tim Zhu >
Subject: Occasional Out of order tuples when emitting from a
thread
Hi there,
We have the following setup:

  * we have a generic operator that’s processing tuples in its
input port
  * in the input port’s process method, we check for a
condition, and:
  o if the condition is met, the tuple is emitted to the
next operator right away (in the process method)
  o 

Re: Debugging operator latency.

2017-02-21 Thread Vlad Rozov
The name of the property is dt.attr. CONTAINER_JVM_OPTIONS. You don't 
need to replace  with the actual path as it is not known till 
an application and a containers are allocated by Yarn (Apex will replace 
 with the container log directory). You may also set the 
attribute at the launch time: 
-Ddt.attr.CONTAINER_JVM_OPTIONS="-Xloggc:/gc.log -verbose:gc 
-XX:+PrintGCDateStamps" or using DT RTS console.


Thank you,

Vlad

/Join us at Apex Big Data World-San Jose 
, April 4, 2017/
http://www.apexbigdata.com/san-jose-register.html 


On 2/21/17 00:44, Ambarish Pande wrote:
I cant find the gc.log file in container logs folder. Do i have to 
replace the LOG_DIR with a path?


On Tue, Feb 21, 2017 at 12:21 PM, Tushar Gosavi 
> wrote:


Hi Ambarish,

you could add following property in your applications
properties.xml file.


dt.application.*.attr.containerJvmOpts
-Xloggc:LOG_DIR/gc.log -verbose:gc
-XX:+PrintGCDateStamps
  

when this property is used, you will see gc.log file in container
directory.

- Tushar.


On Tue, Feb 21, 2017 at 12:06 PM, Ambarish Pande
> wrote:
> Hello,
>
> I tried enabling gclogs in hadoop configurations. Do I need to
enable it in
> Datatorrent RTS somewhere or in my app?. If so, how should I do it?
>
> Thank You
>
> On Fri, Feb 17, 2017 at 1:20 PM, Ashwin Chandra Putta
> > wrote:
>>
>> It is probably not able to keep up with the throughput and
might need more
>> partitions. Check the CPU and memory utilization of its
container. If memory
>> allocated to container is too low, it might be hitting GC too
often. You can
>> enable GC logging and check GC logs.
>>
>> Regards,
>> Ashwin.
>>
>>
>> On Feb 16, 2017 11:12 PM, "Ambarish Pande"
>
>> wrote:
>>
>> Hello,
>>
>> I wanted to know why my operator latency is increasing. Which
logs should
>> I check to get any idea about that. I have checked the
container dt.log ,
>> stderr and stdout, but I cannot find anything there.
>>
>> Thank You
>>
>>
>






Re: Debugging operator latency.

2017-02-17 Thread Vlad Rozov
I would start with taking few stack traces (the latest Datatorrent RTS 
allows to take a stack trace otherwise use jstack) of the container 
where the operator is deployed and trying to identify what operator's 
thread is doing. If there is nothing obvious there, proceed as Ashwin 
suggested.


Thank you,

Vlad

/Join us at Apex Big Data World-San Jose 
, April 4, 2017/
http://www.apexbigdata.com/san-jose-register.html 


On 2/16/17 23:50, Ashwin Chandra Putta wrote:
It is probably not able to keep up with the throughput and might need 
more partitions. Check the CPU and memory utilization of its 
container. If memory allocated to container is too low, it might be 
hitting GC too often. You can enable GC logging and check GC logs.


Regards,
Ashwin.


On Feb 16, 2017 11:12 PM, "Ambarish Pande" 
> 
wrote:


Hello,

I wanted to know why my operator latency is increasing. Which logs
should I check to get any idea about that. I have checked the
container dt.log , stderr and stdout, but I cannot find anything
there.

Thank You






Re: Logging

2017-01-08 Thread Vlad Rozov
One option is to use SocketAppender, but I will not recommend using real 
time logging in one place for production - it is not scalable. At some 
point logging to a single place will lead to application performance 
degradation.


Thank you,

Vlad

On 1/5/17 06:11, Doyle, Austin O. wrote:


What’s the best way to get real time logging from apex in one place?

Thanks,

Austin





Re: Data duplication between operators

2016-12-19 Thread Vlad Rozov
This will be a bug unless the downstream operator constantly fails and 
is restored to a checkpoint in which case it is expected that it may get 
the same tuple multiple times.


Thank you,

Vlad

On 12/19/16 11:33, Doyle, Austin O. wrote:


I’m trying to send some sequential data between an S3 Input Operator 
and a CSV Parser operator.  I added logging to see what the outputPort 
is emitting and it seems to be straightforward (data points 1 to 
1000).  I added logging on the input of the CSV Parser which receives 
1000 data points but not the correct data points.  It actually 
receives random data points multiple times (like point 57 twenty or so 
times).  Has anyone seen anything like this?


Thanks,

Austin





Re: Exception with API

2016-12-12 Thread Vlad Rozov

Hi Brandon,

Check the .apa package validity. Try to manually copy the .apa file to 
the node where gateway is installed, start apex using - option and 
run "get-app-package-info ".


Thank you,

Vlad

On 12/12/16 07:20, Feldkamp, Brandon (CONT) wrote:


Hello,

I’m getting the below exception when trying to use the API to upload 
an app. Any idea what it means? I’ve gotten this to work previously.


2016-12-12 10:18:59,893 ERROR com.datatorrent.gateway.y: Command 
failed and returned this in stderr:


2016-12-12 10:18:59,894 INFO 
com.datatorrent.gateway.resources.ws.v2.WSResource: Caught exception 
in processing web service: com.datatorrent.gateway.s:


2016-12-12 10:18:59,911 ERROR 
com.datatorrent.stram.util.StreamGobbler: Caught exception


java.io.IOException: Stream closed

at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)

at java.io.BufferedInputStream.read1(BufferedInputStream.java:283)

at java.io.BufferedInputStream.read(BufferedInputStream.java:345)

at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)

at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)

at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)

at java.io.InputStreamReader.read(InputStreamReader.java:184)

at java.io.BufferedReader.fill(BufferedReader.java:161)

at java.io.BufferedReader.readLine(BufferedReader.java:324)

at java.io.BufferedReader.readLine(BufferedReader.java:389)

at com.datatorrent.stram.util.StreamGobbler.run(StreamGobbler.java:62)

Thanks!

Brandon




The information contained in this e-mail is confidential and/or 
proprietary to Capital One and/or its affiliates and may only be used 
solely in performance of work or services for Capital One. The 
information transmitted herewith is intended only for use by the 
individual or entity to which it is addressed. If the reader of this 
message is not the intended recipient, you are hereby notified that 
any review, retransmission, dissemination, distribution, copying or 
other use of, or taking of any action in reliance upon this 
information is strictly prohibited. If you have received this 
communication in error, please contact the sender and delete the 
material from your computer.






Re: Query

2016-12-06 Thread Vlad Rozov
I'd recommend to use additional output port solution outlined by 
Bhupesh. There are few Apex applications on the field that leverage that 
solution.


Thank you,

Vlad

On 12/4/16 11:45, Vishal Agrawal wrote:

Thank you Bhupesh and Ram. Appreciate your quick response.

I see ThrottlingStatsListener.processStats() method gets called 
whenever new stats are received from the operators.
How frequently these stats are sent by the operators? Is it end of 
every window?



Thanks,
Vishal



On Sat, Dec 3, 2016 at 1:46 PM, Munagala Ramanath > wrote:


To further clarify Bhupesh's comment, suppose you determine in
window N in the input operator the
data reading phase is complete and send the control tuple on the
dedicated port to the output
operator in window N+1. If the downstream operators (including the
output operator) P_i are
processing respective windows W_i the output operator will
not actually see that control tuple until all the W_i have reached
N+1.

Another option is to use the OperatorRequest mechanism to
communicate among the operators
out-of-band; an example is at:
https://github.com/DataTorrent/examples/tree/master/tutorials/throttle

That example shows how to modulate the speed of upstream operators
but it can be adapted for
your scenario by checking and recording the "completion status" of
all the operators.

Ram

On Sat, Dec 3, 2016 at 5:10 AM, Bhupesh Chawda
> wrote:

Hi Vishal,



A window is processed by an operator only when the previous
window is completely processed. When you send the control
tuple in a new window, you can be sure that all previous
windows have been processed.


That is the reason I asked you to send the control tuple in a
new window.



For shutdown, you can try throwing a  ShutdownException() from
the input operator. This will propagate through the entire Dag
and shutdown all the operators in sequence.



~ Bhupesh




On Dec 3, 2016 18:15, "Vishal Agrawal"
> wrote:

Thank you Bhupesh.

Another catch is just because input operator has processed
last record doesn't mean all the intermediate operators
have processed it as well. How can I ensure that all the
operators have processed all the records before performing
the write operation.

Also is there a way to shutdown the dag programmatically
once it has performed the write operation.


Thanks,
Vishal


On Fri, Dec 2, 2016 at 11:11 PM Bhupesh Chawda
>
wrote:

Hi Vishal,

The support for such operations is currently being
enhanced in Apex.

For now, you can do the following:
 - Have an additional output port in your input
operator as well as an input port in the "Writer"
operator.
 - Once the Input operator has read and emitted all
the data that it wanted to, you can send a tuple on
the new port that you have created. This tuple will
act as your signal. Make sure to do this in a new
window - ideally if the input is done in window x,
send this tuple in window x+1.
 - When you receive this tuple on the Writer operator,
you can perform the write operation on the external
system.

~ Bhupesh

On Sat, Dec 3, 2016 at 3:56 AM, Vishal Agrawal
> wrote:

Hi,

I am performing a batch operation. My input
operator is reading multiple files line by line
and then there are bunch of operators manipulating
the records to evaluate result.
My output operator is supposed to write the final
result to external system once all the records
from each of the files are processed.

On completion of reading all the files, how can I
trigger an event which will inform my output
operator to perform the write operation on
external system.


Thanks,
Vishal























Re: Packaging Operator Jars

2016-11-30 Thread Vlad Rozov
It is necessary to exclude hadoop jars from the application package. 
Please see 
http://docs.datatorrent.com/troubleshooting/#hadoop-dependencies-conflicts


Thank you,

Vlad

On 11/30/16 12:16, Doyle, Austin O. wrote:


I was wondering if there was a way to package operator jars with their 
dependencies and put them into the lib directory for an apa file, and 
if I were able to do that are there additional files that need to be 
added apart from the packaged operator jars?  I tried with a few 
different systems and always get something that’s missing such as 
 java.lang.NoSuchMethodError: 
org.apache.hadoop.yarn.api.records.ContainerId.fromString.  I ensure 
that I have the hadoop yarn jars of version 2.6.4 in the lib folder as 
well but it doesn’t seem to help.


Thanks!

Austin





Re: Connection refused exception

2016-11-30 Thread Vlad Rozov
Hmm, that is strange. I would expect an ERROR in the KafkaInputOperator 
around 21:01. Please check the earliest killed container log.


2016-11-24 21:01:01,839 [IPC Server handler 4 on 44453] ERROR 
stram.StreamingContainerManager processOperatorFailure - Initiating 
container restart after operator failure 
PTOperator[id=1,name=kafkaInputOperator]


Thank you,

Vlad

On 11/30/16 07:17, Feldkamp, Brandon (CONT) wrote:


This is the only other stacktrace I could find but it’s dated after 
the initial cause of failure.


2016-11-24 21:52:24,681 [14/kafkaInputOperator:KafkaInputOperator] 
ERROR engine.StreamingContainer run - Operator set 
[OperatorDeployInfo[id=14,name=kafkaInputOperator,type=INPUT,checkpoint={58378663239f, 
0, 
0},inputs=[],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=outputPort,streamId=KafkaInput 
-> AuthParser,bufferServer=ip-200-120-36 stopped running due to an 
exception.


java.lang.RuntimeException: replay

at 
com.datatorrent.contrib.kafka.AbstractKafkaInputOperator.replay(AbstractKafkaInputOperator.java:330)


at 
com.datatorrent.contrib.kafka.AbstractKafkaInputOperator.beginWindow(AbstractKafkaInputOperator.java:266)


at com.datatorrent.stram.engine.InputNode.run(InputNode.java:122)

at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)


Caused by: java.io.EOFException: Received -1 when reading from 
channel, socket has likely been closed.


at kafka.utils.Utils$.read(Utils.scala:376)

at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)


at 
kafka.network.Receive$class.readCompletely(Transmission.scala:56)


at 
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)


at 
kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)


at 
kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81)


at 
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)


at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109)


at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)


at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)


at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)

at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108)


at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)


at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)


at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)

at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)

at 
kafka.javaapi.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:47)


at 
com.datatorrent.contrib.kafka.AbstractKafkaInputOperator.replay(AbstractKafkaInputOperator.java:304)


... 3 more

2016-11-24 21:52:24,690 [14/kafkaInputOperator:KafkaInputOperator] 
ERROR engine.StreamingContainer run - Shutdown of operator 
OperatorDeployInfo[id=14,name=kafkaInputOperator,type=INPUT,checkpoint={58378663239f, 
0, 
0},inputs=[],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=outputPort,streamId=KafkaInput 
-> AuthParser,bufferServer=ip-200-120-36]]] failed due to an exception.


java.lang.NullPointerException

at 
com.datatorrent.contrib.kafka.SimpleKafkaConsumer.close(SimpleKafkaConsumer.java:348)


at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:303)

at 
com.datatorrent.contrib.kafka.KafkaConsumer.stop(KafkaConsumer.java:157)


at 
com.datatorrent.contrib.kafka.AbstractKafkaInputOperator.deactivate(AbstractKafkaInputOperator.java:405)


at com.datatorrent.stram.engine.Node.deactivate(Node.java:646)

at 
com.datatorrent.stram.engine.StreamingContainer.teardownNode(StreamingContainer.java:1347)


at 
com.datatorrent.stram.engine.StreamingContainer.access$500(StreamingContainer.java:130)


at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1438)


*From: *Vlad Rozov <v.ro...@datatorrent.com>
*Organization: *DataTorrent
*Reply-To: *"users@apex.apache.org" <users@apex.apache.org>
*Date: *Tuesday, November 29, 2016 at 10:21 PM
*To: *"users@apex.apache.org" <users@apex.apache.org>
*Subject: *Re: Connection refused exception

Brandon,

Is the stacktrace from the app master log or is it from a local mode 
run? The exception is raised due to the buffer server termination in 
one of the operators containers and I guess that it happens when app 
master tries to purge data from the buffer server. Check the operators 
containers logs for exceptions/errors unless th

Re: Connection refused exception

2016-11-29 Thread Vlad Rozov

Brandon,

Is the stacktrace from the app master log or is it from a local mode 
run? The exception is raised due to the buffer server termination in one 
of the operators containers and I guess that it happens when app master 
tries to purge data from the buffer server. Check the operators 
containers logs for exceptions/errors unless the application was started 
in the local mode. In the later case, try to enable DEBUG level.


Thank you,

Vlad

On 11/29/16 05:54, Feldkamp, Brandon (CONT) wrote:


Hello,

I’m wondering if anyone has seen a similar stack trace before as there 
isn’t a lot of info provided. I’m wondering if the connection refused 
is from one operator to another or the kafkaInputOperator being unable 
to connect to kafka.


Any ideas? Here’s the stacktrace:

2016-11-24 21:00:51,998 [ProcessWideEventLoop] ERROR 
netlet.AbstractClient handleException - Exception in event loop 
{id=ProcessWideEventLoop, head=7418, tail=7416, capacity=1024}


java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at 
com.datatorrent.netlet.DefaultEventLoop.handleSelectedKey(DefaultEventLoop.java:371)


at 
com.datatorrent.netlet.OptimizedEventLoop$SelectedSelectionKeySet.forEach(OptimizedEventLoop.java:59)


at 
com.datatorrent.netlet.OptimizedEventLoop.runEventLoop(OptimizedEventLoop.java:192)


at 
com.datatorrent.netlet.OptimizedEventLoop.runEventLoop(OptimizedEventLoop.java:157)


at com.datatorrent.netlet.DefaultEventLoop.run(DefaultEventLoop.java:156)

at java.lang.Thread.run(Thread.java:745)

2016-11-24 21:01:01,839 [IPC Server handler 4 on 44453] ERROR 
stram.StreamingContainerManager processOperatorFailure - Initiating 
container restart after operator failure 
PTOperator[id=1,name=kafkaInputOperator]


2016-11-24 21:01:01,898 [IPC Server handler 29 on 44453] ERROR 
stram.StreamingContainerManager processOperatorFailure - Initiating 
container restart after operator failure 
PTOperator[id=7,name=kafkaInputOperator]


2016-11-24 21:01:32,991 [IPC Server handler 24 on 44453] ERROR 
stram.StreamingContainerManager processOperatorFailure - Initiating 
container restart after operator failure 
PTOperator[id=13,name=kafkaInputOperator]


2016-11-24 21:01:44,189 [IPC Server handler 22 on 44453] ERROR 
stram.StreamingContainerManager processOperatorFailure - Initiating 
container restart after operator failure 
PTOperator[id=10,name=kafkaInputOperator]


2016-11-24 21:01:44,604 [IPC Server handler 5 on 44453] ERROR 
stram.StreamingContainerManager processOperatorFailure - Initiating 
container restart after operator failure 
PTOperator[id=3,name=kafkaInputOperator]


2016-11-24 21:01:44,744 [IPC Server handler 16 on 44453] ERROR 
stram.StreamingContainerManager processOperatorFailure - Initiating 
container restart after operator failure 
PTOperator[id=12,name=kafkaInputOperator]


Thanks!

Brandon




The information contained in this e-mail is confidential and/or 
proprietary to Capital One and/or its affiliates and may only be used 
solely in performance of work or services for Capital One. The 
information transmitted herewith is intended only for use by the 
individual or entity to which it is addressed. If the reader of this 
message is not the intended recipient, you are hereby notified that 
any review, retransmission, dissemination, distribution, copying or 
other use of, or taking of any action in reliance upon this 
information is strictly prohibited. If you have received this 
communication in error, please contact the sender and delete the 
material from your computer.






Re: error launching in docker sandbox

2016-11-29 Thread Vlad Rozov

Hi Brandon,

No, it should not. Start apex with - option that will provide more 
details in the log.


Thank you,

Vlad

On 11/29/16 14:07, Feldkamp, Brandon (CONT) wrote:


Hello!

I’m trying to launch my app in the docker sandbox but I’m getting an 
error. I’m able to launch the app elsewhere using the datatorrent 
community edition…would the difference between apex/datatorrent cause 
this error?


com.datatorrent.stram.cli.ApexCli$CliException: No applications in 
Application Package


at 
com.datatorrent.stram.cli.ApexCli.getLaunchAppPackageArgs(ApexCli.java:3471)


at com.datatorrent.stram.cli.ApexCli.launchAppPackage(ApexCli.java:3445)

at com.datatorrent.stram.cli.ApexCli.access$7400(ApexCli.java:151)

at 
com.datatorrent.stram.cli.ApexCli$LaunchCommand.execute(ApexCli.java:1900)


at com.datatorrent.stram.cli.ApexCli$3.run(ApexCli.java:1462)

Thanks!

Brandon




The information contained in this e-mail is confidential and/or 
proprietary to Capital One and/or its affiliates and may only be used 
solely in performance of work or services for Capital One. The 
information transmitted herewith is intended only for use by the 
individual or entity to which it is addressed. If the reader of this 
message is not the intended recipient, you are hereby notified that 
any review, retransmission, dissemination, distribution, copying or 
other use of, or taking of any action in reliance upon this 
information is strictly prohibited. If you have received this 
communication in error, please contact the sender and delete the 
material from your computer.






Re: balanced of Stream Codec

2016-10-17 Thread Vlad Rozov
Using different hash function will help only in case data is equally 
distributed across categories. In many cases data is skewed and some 
categories occur more frequently than others. In such case generic hash 
function will not help. Can you try to sample data and see if the data 
is equally distributed across categories?


Vlad


On 10/16/16 10:40, Pramod Immaneni wrote:

Hi Sunil,

Have you tried an alternate hashing function other than java hashcode 
that might provide a more uniform distribution of your data? The 
google guava library provides a set of hashing strategies, like murmur 
hash, that is reported to have lesser hash collisions in different 
cases. Below is a link explaining these from their website


https://github.com/google/guava/wiki/HashingExplained

Here is a link where someone has done a comparative study of different 
hashing functions

http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

If you end up choosing hashing function from google guava library, 
make sure you use the documentation from guava version 11.0 as this 
version of guava is already included in Hadoop classpath.


Thanks

On Fri, Oct 14, 2016 at 1:17 PM, Sunil Parmar 
> wrote:


We’re using Stream codec to consistently / parallel processing of
the data across the operator partitions. Our requirement is to
serialize processing of the data based on particular tuple
attribute let’s call it ‘catagory_name’ . In order to achieve the
parallel processing of different category names we’re written our
stream codec as following.

   public class CatagoryStreamCodec extends
KryoSerializableStreamCodec {

private static final long serialVersionUID = -687991492884005033L;

@Override

public int getPartition(Object in) {

try {

InputTuple tuple = (InputTuple) in;

String partitionKehy = tuple.getName();

if(partitionKehy != null) {

return partitionKehy.hashCode();

}

}

   }

It’s working as expected *but *we observed inconsistent partitions
when we run this in production env with 20 partitioner of the
operator following the codec in the dag.

  * Some operator instance didn’t process any data
  * Some operator instance process as many tuples as combined
everybody else


Questions :

  * getPartition method supposed to return the actual partition or
just some lower bit used for deciding partition ?
  * Number of partitions is known to application properties and
can vary between deployments or environments. Is it best
practice to use that property in the stream codec ?
  * Any recommended hash function for getting consistent
variations in the lower bit with less variety of data. we’ve
~100+ categories and I’m thinking to have 10+ operator
partitions.


Thanks,
Sunil






Re: datatorrent gateway application package problem

2016-09-28 Thread Vlad Rozov
Check namenode logs. Are there any exceptions/errors raised at the same 
time as the gateway exception?


On 9/28/16 14:43, d...@softiron.co.uk wrote:

On 2016-09-28 14:09, David Yan wrote:

dtgateway does not use the native libraries.

I just tried a clean install of RTS 3.5.0 community edition on plain
vanilla hadoop 2.7.3 on ubuntu 14.04 and was able to launch the pi
demo from the web interface. I was able to do the same even with the
native libraries removed.


Thanks.  I thought it was a long shot.


What is your fs.defaultFS property in hadoop? Can you verify that it
points to your hdfs (hdfs://namenode:9000/) ?


In $HADOOP_CONF_DIR/hdfs-site.xml, fs.defaultFS is hdfs://namenode:9000

Note also that if I try to give a URL to the dtgateway Installation 
Wizard

that isn't the namenode host/port, it complains immediately, instead of
proceeding to the next page.


Unfortunately I don't have access to an ARM64 so any finding you post
here would be helpful.


I'm not surprised.  A classic chicken-and-egg problem.  There aren't a 
lot of machines around, so the software doesn't get a lot of 
attention, so people don't buy machines.





David

On Wed, Sep 28, 2016 at 11:04 AM,  wrote:


I had a thought of what might be the problem:
the hadoop native library doesn't build on SUSE/SLES ARM64,
so I'm running with the pure Java library.

Is it possible that the dtgateway is trying to use the native
library
directly?

Thanks,
-david

On 2016-09-26 14:45, David Yan wrote:

Do you see any exception stacktrace in the log when this error
occurred:

Wrong FS: hdfs://namenode:9000/user/dtadmin/datatorrent, expected:
file:///

David

On Mon, Sep 26, 2016 at 2:39 PM,  wrote:

FromDavid Yan 
DateSat 00:12

Can you please provide any exception stacktrace in the
dtgateway.log file when that happens?

I reran the dtgateway installation wizard, which failed (same as
before) with error msg:

| DFS directory cannot be written to with error message "Mkdirs
failed to create /user/dtadmin/datatorrent (exists=false,
cwd=file:/opt/datatorrent/releases/3.4.0)"

The corresponding exception in dtgateway.log is:

| 2016-09-26 13:43:54,767 ERROR com.datatorrent.gateway.I: DFS
Directory cannot be written to with exception:
| java.io.IOException: Mkdirs failed to create
/user/dtadmin/datatorrent (exists=false,
cwd=file:/opt/datatorrent/releases/3.4.0)
| at


org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455) 




| at


org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440) 




| at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
| at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
| at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
| at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
| at
com.datatorrent.stram.client.FSAgent.createFile(FSAgent.java:77)
| at com.datatorrent.gateway.I.h(gc:324)
| at com.datatorrent.gateway.I.h(gc:284)
| at
com.datatorrent.gateway.resources.ws.v2.ConfigResource.h(fc:136)
| at
com.datatorrent.gateway.resources.ws.v2.ConfigResource.h(fc:171)
| at


com.datatorrent.gateway.resources.ws.v2.ConfigResource.setConfigProperty(fc:34) 




| at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
| at


sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 




| at


sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 




| at java.lang.reflect.Method.invoke(Method.java:498)
| at


com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) 




| at


com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) 




| at


com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) 




| at


com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) 




| at


com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) 




| at


com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:134) 




| at


com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) 




| at


com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:134) 




| at


com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) 




| at


com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) 




| at


com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) 




| at



Re: Apex Application Error

2016-09-05 Thread Vlad Rozov
Please check your application package (.apa) and application pom.xml for 
hadoop/yarn libraries. All hadoop dependencies must be excluded as they 
are provided at run-time.


Vlad

On 9/4/16 16:14, JOHN, BIBIN wrote:


Could you please help me to identify the root  cause for below failure?

Container: container_e11_1472055988122_0035_02_01 on 
zlx71303.vci.att.com_45454


=

LogType:AppMaster.stderr

Log Upload Time:Wed Aug 31 10:44:51 -0700 2016

LogLength:1259

Log Contents:

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in 
[jar:file:/opt/data/data02/yarn/local/usercache/bj9306/appcache/application_1472055988122_0035/filecache/24/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]


SLF4J: Found binding in 
[jar:file:/opt/app/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]


SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.


SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Exception in thread "main" java.lang.IllegalArgumentException: Invalid 
ContainerId: container_e11_1472055988122_0035_02_01


at 
org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)


at 
com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:91)


Caused by: java.lang.NumberFormatException: For input string: "e11"

at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)


at java.lang.Long.parseLong(Long.java:589)

at java.lang.Long.parseLong(Long.java:631)

at 
org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)


at 
org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)


... 1 more

End of LogType:AppMaster.stderr

//





Re: DT Exception while streaming byte array to object array

2016-08-18 Thread Vlad Rozov
These errors mean that a downstream operator exited without 
closing/disconnecting socket to the buffer server. Please check 
application master log to identify container/operator that fails first.


Vlad

On 8/18/16 10:57, Mukkamula, Suryavamshivardhan (CWM-NR) wrote:

Hi Team,
Can you please throw some light as why I am getting these exceptions. 
We are trying to stream input file into byte array and converting them 
to object arrays. We are getting these exceptions while doing 
converting byte array to object array.
2016-08-18 12:36:05,272 INFO  engine.StreamingContainer 
(StreamingContainer.java:main(292)) - Child starting with classpath: 

Re: Information Needed

2016-08-02 Thread Vlad Rozov
Both setup() and beginWindow() should work. It will be more correct to 
open the configuration stream and parse the configuration file in 
setup() as you tried in the initial implementation as long as 
configuration path does not depend on window Id. Where the 
inputConfStream is used? Most likely it reaches EOF unexpectedly.


Vlad


On 8/2/16 12:19, Mukkamula, Suryavamshivardhan (CWM-NR) wrote:

Hi Team,
When I am trying to read input line from feed, to parse the line I am 
reading another configuration file from HDFS. To avoid reading the 
configuration file for every line I would like to read it in the 
beginWindow() method. But the Input stream is getting closed and 
operator is not holding the stream for all the tuples.
Can I read the input Stream Once for all the tuples? (I tried in the 
setup() method as well , but no luck)

@Override
  public void beginWindow(long windowId)
  {
super.beginWindow(windowId);
try {
inputConfStream = getFS().open(new 
Path(getInputConfFile()));

} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
LOG.error("beginWindow: Error while streaming 
the input Configuration File = {}", getInputConfFile());

}
  }
Regards,
Surya Vamshi

___

If you received this email in error, please advise the sender (by 
return email or otherwise) immediately. You have consented to receive 
the attached electronically at the above-noted email address; please 
retain a copy of this confirmation for future reference.


Si vous recevez ce courriel par erreur, veuillez en aviser 
l'expéditeur immédiatement, par retour de courriel ou par un autre 
moyen. Vous avez accepté de recevoir le(s) document(s) ci-joint(s) par 
voie électronique à l'adresse courriel indiquée ci-dessus; veuillez 
conserver une copie de cette confirmation pour les fins de reference 
future.