Re: DataTorrent with SBT: .apa file not created

2016-07-11 Thread Ankit Sarraf
I am using the sbt assembly plugin. I've added screenshot for reference:

[image: Inline image 1]

To give better view of the things:
The java directory inside the src is DT Application
The Scala directory is the Random Kafka Generator.

And on doing the sbt assembly, only AlertDeterminer-assembly-1.0.jar is
created inside the target/scala-2.11/ directory.

To give more information, following are the contents of build.sbt:

name := "AlertDeterminer"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
  "org.apache.kafka" % "kafka_2.11" % "0.9.0.1" % "provided",
  "org.apache.apex" % "malhar-library" % "3.4.0" % "provided",
  "org.apache.apex" % "malhar-contrib" % "3.4.0" % "provided"
)

resolvers += Resolver.sonatypeRepo("public")

Also, are you sure I need to used the sbt-maven-plugin? The reason I am
asking is it is required for working other way round. i.e. pom to build.sbt.

Meanwhile, I am looking for the 2nd option.

Thanks
Ankit.


On Mon, Jul 11, 2016 at 2:56 PM, hsy...@gmail.com  wrote:

> I've never used SBT to build Apex application. But I guess you can try 2
> things here
> Use the sbt maven plugin
> https://github.com/shivawu/sbt-maven-plugin
> or use sbt assembly plugin
> https://github.com/sbt/sbt-assembly
>
> In the 2nd way, you need to translate the plugin configuration part in
> pom.xml to sbt scripts.
> The configuration usually look like this
>
> I wish this helps
>
> 
>   maven-assembly-plugin
>   
> 
>   app-package-assembly
>   package
>   
> single
>   
>   
> 
> ${project.artifactId}-${project.version}-apexapp
> false
> 
>   src/assemble/appPackage.xml
> 
> 
>   0755
> 
> 
>   
> ${apex.apppackage.classpath}
> ${apex.core.version}
> 
> ${apex.apppackage.groupid}
> ${project.artifactId}
> 
> ${project.version}
> 
> ${project.name}
> 
> ${project.description}
>   
> 
>   
> 
>   
> 
>
>
> On Mon, Jul 11, 2016 at 2:01 PM, Ankit Sarraf 
> wrote:
>
>> I am using SBT to create a DataTorrent Application. The project comprises
>> of 2 parts. Part 1 is a Random Kafka Generator built using Scala. Part 2 is
>> the DataTorrent Application (Java) to ingest data, process it, and write to
>> HDFS.
>>
>> There are no errors while doing sbt assembly.
>>
>> Although, Uber JAR is created successfully, .apa file is not created. So
>> does DataTorrent work with SBT?
>>
>> Thanks
>> Ankit.
>>
>
>


Re: DataTorrent with SBT: .apa file not created

2016-07-11 Thread hsy...@gmail.com
I've never used SBT to build Apex application. But I guess you can try 2
things here
Use the sbt maven plugin
https://github.com/shivawu/sbt-maven-plugin
or use sbt assembly plugin
https://github.com/sbt/sbt-assembly

In the 2nd way, you need to translate the plugin configuration part in
pom.xml to sbt scripts.
The configuration usually look like this

I wish this helps


  maven-assembly-plugin
  

  app-package-assembly
  package
  
single
  
  
${project.artifactId}-${project.version}-apexapp
false

  src/assemble/appPackage.xml


  0755


  
${apex.apppackage.classpath}
${apex.core.version}

${apex.apppackage.groupid}
${project.artifactId}
${project.version}

${project.name}

${project.description}
  

  

  



On Mon, Jul 11, 2016 at 2:01 PM, Ankit Sarraf 
wrote:

> I am using SBT to create a DataTorrent Application. The project comprises
> of 2 parts. Part 1 is a Random Kafka Generator built using Scala. Part 2 is
> the DataTorrent Application (Java) to ingest data, process it, and write to
> HDFS.
>
> There are no errors while doing sbt assembly.
>
> Although, Uber JAR is created successfully, .apa file is not created. So
> does DataTorrent work with SBT?
>
> Thanks
> Ankit.
>


RE: Inputs needed on File Writer

2016-07-11 Thread Mukkamula, Suryavamshivardhan (CWM-NR)
Thank you Priyanka.

One suggestion needed from data torrent team, In our use case we need to read 
around 120 directories in parallel, we would like to keep operator memory(with 
container local) as lower as possible to reduce the burden on the cluster. As 
long as cluster resources are sufficient we can run the DT application 
continuously with pre-defined scan interval.

My concern is, can we run DT application in batch mode with external tools like 
oozie (we want to start and stop the application at predefined time in a day, 
instead of making it run continuously).  This we would want it to reduce the 
burden on the cluster on peak hours. I heard that DT application will release 
the memory by default when not in use so we don’t need to worry when 
application is not streaming.

Please through some light on this.

Regards,
Surya Vamshi
From: Priyanka Gugale [mailto:priya...@datatorrent.com]
Sent: 2016, July, 11 7:11 AM
To: users@apex.apache.org
Subject: Re: Inputs needed on File Writer

Hi,

Check app: https://github.com/apache/apex-malhar/tree/master/apps/filecopy
This is for HDFS to HDFS copy but I could use same app to copy from HDFS to FTP 
as HDFS api supports ftp as well.

Please note following property I used to run the app:


dt.operator.HDFSFileCopyModule.prop.outputDirectoryPath
ftp://ftpadmin:ftpadmin@localhost:21/home/ftp/ftpadmin/out
  


-Priyanka

On Sat, Jul 9, 2016 at 12:33 PM, Priyanka Gugale 
> wrote:

I m traveling over weekends, would get back on Monday.

-Priyanka
On Jul 8, 2016 8:21 PM, "Mukkamula, Suryavamshivardhan (CWM-NR)" 
>
 wrote:
Thank you Priyanka. Do you have any example that uses this Operator for FTP?

Regards,
Surya Vamshi

From: Priyanka Gugale 
[mailto:priya...@datatorrent.com]
Sent: 2016, July, 08 10:48 AM
To: users@apex.apache.org
Subject: RE: Inputs needed on File Writer


Yes, ftp is supported but not sftp.

-Priyanka
On Jul 8, 2016 7:00 PM, "Mukkamula, Suryavamshivardhan (CWM-NR)" 
>
 wrote:
Hi Priyanka,

Thank you for your inputs.

It may be dumb question, I heard from data torrent that SFTP is not supported 
for now in my previous communications.That means FTP is supported and SFTP is 
not supported ? please clarify the difference.

Regards,
Surya Vamshi

From: Priyanka Gugale 
[mailto:priya...@datatorrent.com]
Sent: 2016, July, 08 12:07 AM
To: users@apex.apache.org
Subject: Re: Inputs needed on File Writer

Hi,

The file will be available after window is committed, you can overwrite 
committed call and start your thread after super.commit is called. You might 
want to double check if file is actually finalized before starting your thread..

For your usecase I would suggest you to use AbstractFileOutputOperator to 
directly write file to ftp.

-Priyanka

On Fri, Jul 8, 2016 at 12:41 AM, Mukkamula, Suryavamshivardhan (CWM-NR) 
>
 wrote:
Hi ,

Can you please let me know what happen when the requestFinalize() method is 
called as per below ?

Once the output files are written to HDFS, I would like to initiate a thread 
that reads the HDFS files and copies to FTP location. So I am trying to 
understand when can I trigger the thread.

### File Writer ##

package com.rbc.aml.cnscan.operator;

import com.datatorrent.api.Context;
import com.datatorrent.lib.io.fs.AbstractFileOutputOperator;
import com.rbc.aml.cnscan.utils.KeyValue;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

public class FileWriter extends AbstractFileOutputOperator> {
private static final Logger LOG = LoggerFactory.getLogger(FileWriter.class);
private List filesToFinalize = new ArrayList<>();

@Override
public void setup(Context.OperatorContext context) {
super.setup(context);
finalizeFiles();
}

@Override
protected byte[] getBytesForTuple(KeyValue tuple) {
if (tuple.value == null) {
 LOG.debug("File to finalize {}",tuple.key);
filesToFinalize.add(tuple.key);
return new byte[0];
}
else {
return tuple.value.getBytes();
}
}

@Override
protected String getFileName(KeyValue tuple) {
return tuple.key;
}

@Override
public void endWindow() {
 LOG.info("end window is called, files are :{}"+filesToFinalize);
super.endWindow();
finalizeFiles();
}

private void finalizeFiles() {